{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<style>\n",
    "pre {\n",
    " white-space: pre-wrap !important;\n",
    "}\n",
    ".table-striped > tbody > tr:nth-of-type(odd) {\n",
    "    background-color: #f9f9f9;\n",
    "}\n",
    ".table-striped > tbody > tr:nth-of-type(even) {\n",
    "    background-color: white;\n",
    "}\n",
    ".table-striped td, .table-striped th, .table-striped tr {\n",
    "    border: 1px solid black;\n",
    "    border-collapse: collapse;\n",
    "    margin: 1em 2em;\n",
    "}\n",
    ".rendered_html td, .rendered_html th {\n",
    "    text-align: left;\n",
    "    vertical-align: middle;\n",
    "    padding: 4px;\n",
    "}\n",
    "</style>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Machine Learning: the Titanic dataset\n",
    "\n",
    "If you want to try out this notebook with a live Python kernel, use mybinder:\n",
    "\n",
    "<a class=\"reference external image-reference\" href=\"https://mybinder.org/v2/gh/vaexio/vaex/latest?filepath=docs%2Fsource%2Fexample_ml_titanic.ipynb\"><img alt=\"https://mybinder.org/badge_logo.svg\" src=\"https://mybinder.org/badge_logo.svg\" width=\"150px\"></a>\n",
    "\n",
    "\n",
    "In the following is a more involved machine learning example, in which we will use a larger variety of methods in `veax` to do data cleaning, feature engineering, pre-processing and finally to train a couple of models. To do this, we will use the well known _Titanic dataset_. Our task is to predict which passengers are more likely to have survived the disaster. \n",
    "\n",
    "Before we begin, there are two important notes to consider:\n",
    " - The following example is not to provide a competitive score for any competitions that might use the _Titanic dataset_. It's primary goal is to show how various methods provided by `vaex` and `vaex.ml` can be used to clean data, create new features, and do general data manipulations in a machine learning context. \n",
    " - While the _Titanic dataset_ is rather small in side, all the methods and operations presented in the solution below will work on a dataset of arbitrary size, as long as the data fits on the hard-drive of your machine.\n",
    " \n",
    "Now, with that out of the way, let's get started!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:37.005009Z",
     "start_time": "2020-05-01T17:12:35.667407Z"
    }
   },
   "outputs": [],
   "source": [
    "import vaex\n",
    "import vaex.ml\n",
    "\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Adjusting `matplotlib` parmeters\n",
    "\n",
    "_Intermezzo:_ we modify some of the `matplotlib` default settings, just to make the plots a bit more legible."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:37.014957Z",
     "start_time": "2020-05-01T17:12:37.007951Z"
    }
   },
   "outputs": [],
   "source": [
    "SMALL_SIZE = 12\n",
    "MEDIUM_SIZE = 14\n",
    "BIGGER_SIZE = 16\n",
    "\n",
    "plt.rc('font', size=SMALL_SIZE)          # controls default text sizes\n",
    "plt.rc('axes', titlesize=SMALL_SIZE)     # fontsize of the axes title\n",
    "plt.rc('axes', labelsize=MEDIUM_SIZE)    # fontsize of the x and y labels\n",
    "plt.rc('xtick', labelsize=SMALL_SIZE)    # fontsize of the tick labels\n",
    "plt.rc('ytick', labelsize=SMALL_SIZE)    # fontsize of the tick labels\n",
    "plt.rc('legend', fontsize=SMALL_SIZE)    # legend fontsize\n",
    "plt.rc('figure', titlesize=BIGGER_SIZE)  # fontsize of the figure title"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Get the data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First of all we need to read in the data. Since the _Titanic dataset_ is quite well known for trying out different classification algorithms, as well as commonly used as a teaching tool for aspiring data scientists, it ships (no pun intended) together with `vaex.ml`. So let's read it in, see the description of its contents, and get a preview of the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:37.069863Z",
     "start_time": "2020-05-01T17:12:37.017532Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style>.vaex-description pre {\n",
       "          max-width : 450px;\n",
       "          white-space : nowrap;\n",
       "          overflow : hidden;\n",
       "          text-overflow: ellipsis;\n",
       "        }\n",
       "\n",
       "        .vex-description pre:hover {\n",
       "          max-width : initial;\n",
       "          white-space: pre;\n",
       "        }</style>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div><h2>titanic</h2> <b>rows</b>: 1,309</div><h2>Columns:</h2><table class='table-striped'><thead><tr><th>column</th><th>type</th><th>unit</th><th>description</th><th>expression</th></tr></thead><tr><td>pclass</td><td>int64</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>survived</td><td>bool</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>name</td><td>str</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>sex</td><td>str</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>age</td><td>float64</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>sibsp</td><td>int64</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>parch</td><td>int64</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>ticket</td><td>str</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>fare</td><td>float64</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>cabin</td><td>str</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>embarked</td><td>str</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>boat</td><td>str</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>body</td><td>float64</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>home_dest</td><td>str</td><td></td><td ><pre></pre></td><td></td></tr></table><h2>Data:</h2><table>\n",
       "<thead>\n",
       "<tr><th>#                                </th><th>pclass  </th><th>survived  </th><th>name                                           </th><th>sex   </th><th>age   </th><th>sibsp  </th><th>parch  </th><th>ticket  </th><th>fare    </th><th>cabin  </th><th>embarked  </th><th>boat  </th><th>body  </th><th>home_dest                      </th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr><td><i style='opacity: 0.6'>0</i>    </td><td>1       </td><td>True      </td><td>Allen, Miss. Elisabeth Walton                  </td><td>female</td><td>29.0  </td><td>0      </td><td>0      </td><td>24160   </td><td>211.3375</td><td>B5     </td><td>S         </td><td>2     </td><td>nan   </td><td>St Louis, MO                   </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1</i>    </td><td>1       </td><td>True      </td><td>Allison, Master. Hudson Trevor                 </td><td>male  </td><td>0.9167</td><td>1      </td><td>2      </td><td>113781  </td><td>151.55  </td><td>C22 C26</td><td>S         </td><td>11    </td><td>nan   </td><td>Montreal, PQ / Chesterville, ON</td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>2</i>    </td><td>1       </td><td>False     </td><td>Allison, Miss. Helen Loraine                   </td><td>female</td><td>2.0   </td><td>1      </td><td>2      </td><td>113781  </td><td>151.55  </td><td>C22 C26</td><td>S         </td><td>--    </td><td>nan   </td><td>Montreal, PQ / Chesterville, ON</td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>3</i>    </td><td>1       </td><td>False     </td><td>Allison, Mr. Hudson Joshua Creighton           </td><td>male  </td><td>30.0  </td><td>1      </td><td>2      </td><td>113781  </td><td>151.55  </td><td>C22 C26</td><td>S         </td><td>--    </td><td>135.0 </td><td>Montreal, PQ / Chesterville, ON</td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>4</i>    </td><td>1       </td><td>False     </td><td>Allison, Mrs. Hudson J C (Bessie Waldo Daniels)</td><td>female</td><td>25.0  </td><td>1      </td><td>2      </td><td>113781  </td><td>151.55  </td><td>C22 C26</td><td>S         </td><td>--    </td><td>nan   </td><td>Montreal, PQ / Chesterville, ON</td></tr>\n",
       "<tr><td>...                              </td><td>...     </td><td>...       </td><td>...                                            </td><td>...   </td><td>...   </td><td>...    </td><td>...    </td><td>...     </td><td>...     </td><td>...    </td><td>...       </td><td>...   </td><td>...   </td><td>...                            </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,304</i></td><td>3       </td><td>False     </td><td>Zabour, Miss. Hileni                           </td><td>female</td><td>14.5  </td><td>1      </td><td>0      </td><td>2665    </td><td>14.4542 </td><td>--     </td><td>C         </td><td>--    </td><td>328.0 </td><td>--                             </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,305</i></td><td>3       </td><td>False     </td><td>Zabour, Miss. Thamine                          </td><td>female</td><td>nan   </td><td>1      </td><td>0      </td><td>2665    </td><td>14.4542 </td><td>--     </td><td>C         </td><td>--    </td><td>nan   </td><td>--                             </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,306</i></td><td>3       </td><td>False     </td><td>Zakarian, Mr. Mapriededer                      </td><td>male  </td><td>26.5  </td><td>0      </td><td>0      </td><td>2656    </td><td>7.225   </td><td>--     </td><td>C         </td><td>--    </td><td>304.0 </td><td>--                             </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,307</i></td><td>3       </td><td>False     </td><td>Zakarian, Mr. Ortin                            </td><td>male  </td><td>27.0  </td><td>0      </td><td>0      </td><td>2670    </td><td>7.225   </td><td>--     </td><td>C         </td><td>--    </td><td>nan   </td><td>--                             </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,308</i></td><td>3       </td><td>False     </td><td>Zimmerman, Mr. Leo                             </td><td>male  </td><td>29.0  </td><td>0      </td><td>0      </td><td>315082  </td><td>7.875   </td><td>--     </td><td>S         </td><td>--    </td><td>nan   </td><td>--                             </td></tr>\n",
       "</tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Load the titanic dataset\n",
    "df = vaex.datasets.titanic()\n",
    "\n",
    "# See the description\n",
    "df.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Shuffling\n",
    "From the preview of the DataFrame we notice that the data is sorted alphabetically by name and by passenger class.\n",
    "Thus we need to shuffle it before we split it into train and test sets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:37.078118Z",
     "start_time": "2020-05-01T17:12:37.072165Z"
    }
   },
   "outputs": [],
   "source": [
    "# The dataset is ordered, so let's shuffle it\n",
    "df = df.shuffle(random_state=31)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Shuffling for large datasets\n",
    "As mentioned in [The Iris tutorial](ml_iris.ipynb), you are likely to get a better performance if you export to disk your shuffled dataset, especially when the dataset is larger in size:\n",
    "\n",
    "```\n",
    "df.shuffle().export(\"shuffled.hdf5\")\n",
    "df = vaex.open(\"shuffled.hdf5\")\n",
    "df_train, df_test = df.ml.train_test_split(test_size=0.2)\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Split into train and test\n",
    "Once the data is shuffled, let's split it into train and test sets. The test set will comprise 20% of the data. Note that we do not shuffle the data for you, since vaex cannot assume your data fits into memory, you are responsible for either writing it in shuffled order on disk, or shuffle it in memory (the previous step)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:37.128176Z",
     "start_time": "2020-05-01T17:12:37.080094Z"
    }
   },
   "outputs": [],
   "source": [
    "# Train and test split, no shuffling occurs\n",
    "df_train, df_test = df.ml.train_test_split(test_size=0.2, verbose=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Sanity checks\n",
    "\n",
    "Before we move on to process the data, let's verify that our train and test sets are \"similar\" enough. We will not be very rigorous here, but just look at basic statistics of some of the key features.\n",
    "\n",
    "For starters, let's check that the fraction of survivals is similar between the train and test sets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:37.731294Z",
     "start_time": "2020-05-01T17:12:37.129879Z"
    }
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAA1QAAAEUCAYAAAAspncYAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAw8UlEQVR4nO3deZhkZXn///eHGQRkGBEYMKCAuKEouIzBJaJGI1+3SIBvBFExiWL0i8tPElzRcSEKCkbFCLgBLgRRMCoaxQhuUZJxARwEBQRlUQeQgRkWWe7fH+e0FEV3z+mmuqt6+v26rrqmzvOc5a6emXr6PudZUlVIkiRJkqZuvWEHIEmSJElzlQmVJEmSJE2TCZUkSZIkTZMJlSRJkiRNkwmVJEmSJE2TCZUkSZIkTZMJleaMJF9Lsv+w4xgFSfZL8o2O+65I8pRpXueSJE+fzrEdz9/5c0iSZt9Mtb1JjkvyrkGft+f82yZZnWTBTF1DGmNCpRnVfpmNvW5PcmPP9n5TOVdVPbOqjp+pWPsl2T5JJVk4W9fsqqo+U1XP6LjvTlV15gyHtFbj/Tyn8jk6nH+zJKcmWZPk0iQvWMv+OyT5SpLrk1yV5PDpnkvS3DPI9qk935lJXjoDcb4kyfcGfd6uZrvtna7+G4BV9euqWlRVtw3g3ElyWJKr29fhSTLBvvv1/du6oW37HjPVc2nuGLlfFLVuqapFY++TXAK8tKq+2b9fkoVVdetsxjbK5uLPI8mCQTRcd8OHgT8CWwGPBE5LcnZVrejfMck9gNPbY54P3AY8eDrnkjQ3dW2f5qv2l/xU1e0jEMuw28QDgD2AXYCiaT8uBo7u37GqPgN8Zmw7yUuAQ4AfT/Vcmjt8QqWhSPKUJJcleX2S3wKfTHLv9onByiR/aN/ft+eYP939G7tjl+R97b6/SvLMSa73+iSXt08jLkjytLZ8vSRvSHJRe6foc0k2aw/7Tvvnte1dpsd3+Fx/nmR5kuuS/C7Jkb2ft2/fP91NS7IsyeeTfDrJdcCb2rulm/Xs/6j2Scr6vXcskxyd5H195/6PJK8b5zqTfV6SvKh9InN1kjev5bMel+QjSb6aZA3w1CTPTvKT9vP/JsmynkPu8vPsv/Oa5AlJ/jfJqvbPJ6ztZ94etzGwF3BIVa2uqu8BXwJeNMEhLwGuqKojq2pNVd1UVedM81yS1iGTfU8m2bD9nr46ybXt99RWSQ4FngQc1X6/HTXOecc9tq27V5KPJ7mybavelWRBkofS/KL9+Pa813b8DC9JcnHb5v0q7RO3tq35dM9+d+o50Lazhyb5PnADsMNY25tkgzbuh/ccv6Rtq7Zst5+T5Kftfv+dZOeefR+V5MdtTCcBG64l/u8neX+Sa4BlSR6Q5Fvtz++qJJ9Jsmm7/6eAbYEvtz+ng8f5bFsn+VKSa5JcmORlXX6Wrf2BI6rqsqq6HDiCph3peuwJVVUDOJdGlAmVhuk+wGbAdjR3bNYDPtlubwvcCNylUeqxK3ABsAVwOPDx5K6PzZM8BDgQeGxVbQLsDlzSVr+a5k7Rk4GtgT/QPJ0A2K39c9O228AP0vTJvjbJthPE9AHgA1W1GHgA8LnJfgB9ngd8HtgUeC/wA5pf7Me8APh8Vd3Sd9xngeePffYk9waeAfz7ONeY8PMmeRjwEZrEYWtgc+C+45yj1wuAQ4FNgO8Ba4AXt5/h2cArkuzR7nuXn2fvidpfWE4DPthe+0iaJ0Obt/VvSPKVCeJ4MHBbVf2ip+xsYKcJ9n8ccEmasQFXtb8wPGKa55K0bpmsXdgfuBdwP5rvqX8EbqyqNwPfBQ5sv98OHOe84x7b1h0P3Ao8EHgUzXf4S6vq5+1+P2jPuylAkhckOWe84NubQh8Entm2eU8AfjqFz/8imjZ5E+DSscKquhk4Bdi3Z9+/Bb5dVb9P8mjgE8DL2893DPClNhG7B/BF4FM07f7J3Ll9G8+uNE9utqRpZwK8m+bv5KE0P8dlbWwvAn4NPLf9OR0+zvlOBC5rj98b+JfccXP1L9aSrO5E0w6M6dQmJNmOpu074e6eS6PNhErDdDvwtqq6uapurKqrq+oLVXVDVV1P8wX65EmOv7SqPtp2Mzse+DOaLlr9bgM2AB6WZP2quqSqLmrrXg68ub1TdDPNl/PemWDcVNsne9Oq+vUEMd0CPDDJFu3TjR+u5WfQ6wdV9cWqur2qbqRJlPaFP3W92Kct6/ddmm4DT2q3927PdcU4+072efcGvlJV32nrDqH5O5rMf1TV99uYb6qqM6vq3Hb7HJoGbLK/w17PBn5ZVZ+qqlur6kTgfOC5AFX1nqp6zgTHLgJW9ZWtovmFYDz3pfl5fpCmcT0N+I+20Z/quSStWyb7nryFJll4YFXdVlU/qqrrOp533GPbp1TPBF7bPjH/PfB+mu+ocVXVZ6tq54nqab67H55ko6q6cordlY+rqhXt9/B4N/B6E6oXcEe79DLgmKo6q/18xwM309zAehywPvCvVXVLVX0e+N+1xHFFVX2ojePGqrqwqk5vf2dYSXPTrVP7kuR+wF8Ar2/bqp8CH6PteVBV3xtLVifQ3y6sAhaNdxO3z4uB71bVrwZwLo0wEyoN08qqumlsI8k9kxyTpsvZdTRdxDbNxDP0/HbsTVXd0L5d1L9TVV0IvJamUfx9kn9PsnVbvR1wavvU6Vrg5zQJ2HiJWRf/QPOE4/y2O8dECcB4ftO3/Xmabh5b09zhKprk6U7abgT/zh2N3Avo6b/dZ7LPu3VvDFW1Brh6KjEn2TXJGWm6ba6iubO6xVrOMWZreu6Gti4Ftulw7GpgcV/ZYuD6Cfa/EfheVX2tqv4IvI/mF52HTuNcktYtk31Pfgr4OvDvSa5IM6HA+h3PO9Gx29EkG1f2XPMYmiczU9Z+dz+f5vv3yiSnJdlxCqfob4t6fQvYqP2u345mjOmpbd12wEFjn6H9HPej+W7fGri8p9sb3PX7ftI4kmzZtt+Xt78jfJqptS/XtDdre6/fpX2Bu7YLi4HVfZ9nPC+mueE7iHNphJlQaZj6vzwOAh4C7Np2mRvrIna379q0d/P+guYLv4DD2qrf0HSL2LTntWHbr3nKX25V9cuq2pemITwM+Hzb/WINcM+x/dokcUn/4X3nuhb4Bk2XihcAJ07yhXsizR3U7Wi6SXxhgv0m+7xX0jR+YzHekybJmPQj921/lma80f2q6l40ff8zwb79rqD5++m1LXD5Wo4D+AWwMMmDesp2ASa6K3vOJPFM9VyS1i0Tfk+2T1feXlUPo+lK9xyaX5phLd9xkxz7G5onOVv0XG9xVY11A5tOW/T1qvormp4b5wMfbavu1BbRdL2/y+GTnPd2mq7s+9K0S1/pSVJ+Axza93O7Z9vb4Epgm76nMBN1nZ8ojne3ZTu3vyO8kDv/fjDZz+kKYLMkvT0NurYv0Hz/79KzvdY2IckTaRK5z9/dc2n0mVBplGxC8+Tg2nY8zdsGcdIkD0nyl0k2AG5qrzE2G93RwKFtIjI2wPZ5bd1Kmm4TO0zhWi9MsqRtdK5ti2+j+SV9wzSTNqwPvIWmG+LafJamwd2L8bv7AVBVP2nj/Rjw9TYZG89kn/fzwHPavuT3AN7B1L8jNqG5C3hTkj+naXDHrO3n+VXgwe3YgIVJng88DJho3NSftHdkTwHekWTjtiF7Hs0d4fF8Gnhckqe3ye1rgauAn0/jXJLWLRN+TyZ5apJHtN8b19F04xtrT37HJO3FRMdW1ZU0N8+OSLI4zaQYD0gy1p3td8B92+/ltUozScZftzfzbqZ5IjIW40+B3dKMB74X8MZuP5I7+SzNE7D9uHO79FHgH9unV2m/P5/dJjE/oBkj9ur2+31P4M+neN1N2s9ybZJtgH/uq5/w519VvwH+G3h3mslBdqbpUTJRb45+JwCvS7JN22vkIOC4tRyzP/CFvqdi0z2XRpwJlUbJvwIb0fxi+0PgPwd03g2A97Tn/S3N06M3tXUfoHmi8o0k17fX3RX+1I3wUOD7bfeFx+WOhQInurP2f4AVSVa3596n7a+9CnglTcJzOc1dwssmOEevLwEPAn5XVWevZd8TgaczSeK1ls+7Avh/7fFX0gzE7hJjr1fSJCLXA2+lZ1KO8X6evQdW1dU0d2wPoulqeDDwnKq6CiDJm5J8bS3X3gj4Pc3P4hVj4wb6/96q6gKau5tHt5/zecBft93/Jj2XpHXehN+TNE90Pk+TEP0c+DbNDZqx4/ZOM/PsB8c572THvhi4B3AezXfS52meLkHTzW4F8NskY9+H+yWZ6DtpPZrv0SuAa2jGGb0SoKpOB06ieUr/IzrcsOpXVWfRtGFbA1/rKV9OM47qqPYzXEg7e1373bpnu/0HmoTslCle+u3Ao2nGHJ02zvHvBt7Sti//NM7x+wLb0/xcTqUZw306QJInte32RI4BvgycC/ysvf4xY5VJVqRn7bIkG9L0Lhlv/a5Jz6W5KXbZlCRJkqTp8QmVJEmSJE2TCZUkSZIkTZMJlSRJkiRNkwmVJEmSJE2TCZUkSZIkTdPCYQcwk7bYYovafvvthx2GJGlAfvSjH11VVf2LYo8k2yBJWndM1v6s0wnV9ttvz/Lly4cdhiRpQJJcOuwYurINkqR1x2Ttj13+JEmSJGmaTKgkSZIkaZpMqCRJkiRpmkyoJEmSJGmaTKgkSZIkaZpMqCRJkiRpmkyoJEmSJGmaTKgkSZIkaZrW6YV957Lt33DasEOYdy55z7OHHYIkDZ3tz3DYBklzl0+oJEnrvCQHJlme5OYkx/WUb5+kkqzueR3SU58khyW5un0dniRD+RCSpJHkEypJ0nxwBfAuYHdgo3HqN62qW8cpPwDYA9gFKOB04GLg6JkJU5I01/iESpK0zquqU6rqi8DVUzx0f+CIqrqsqi4HjgBeMuDwJElzmAmVJElwaZLLknwyyRY95TsBZ/dsn92WSZIEmFBJkua3q4DHAtsBjwE2AT7TU78IWNWzvQpYNNE4qiQHtGO1lq9cuXKGQpYkjRITKknSvFVVq6tqeVXdWlW/Aw4EnpFkcbvLamBxzyGLgdVVVROc79iqWlpVS5csWTKzwUuSRoIJlSRJdxhLlMaeQK2gmZBizC5tmSRJgAmVJGkeSLIwyYbAAmBBkg3bsl2TPCTJekk2Bz4InFlVY938TgBel2SbJFsDBwHHDeVDSJJGkgmVJGk+eAtwI/AG4IXt+7cAOwD/CVwP/Ay4Gdi357hjgC8D57b1p7VlkiQBrkMlSZoHqmoZsGyC6hMnOa6Ag9uXJEl34RMqSZIkSZomEypJkiRJmqZZT6iS7JPk50nWJLkoyZPa8qclOT/JDUnOSLJdzzFJcliSq9vX4ROtASJJkiRJs2VWE6okfwUcBvwdzeKJuwEXt6vSnwIcAmwGLAdO6jn0AGAPmulqdwaeA7x81gKXJEmSpHHM9hOqtwPvqKofVtXtVXV5VV0O7AmsqKqTq+ommoHDuyTZsT1uf+CIqrqs3f8I4CWzHLskSZIk3UmnhCrJkiRLerYfkeRdSfad7Li+cywAlgJLklyY5LIkRyXZCNgJOHts36paA1zUltNf377fiXEkOSDJ8iTLV65c2TU8SZIkSZqyrk+oPgc8F6Dtnvcd4G+Ao5Mc1PEcWwHrA3sDTwIeCTyKZh2QRcCqvv1X0XQLZJz6VcCi8cZRVdWxVbW0qpYuWbKkv1qSJEmSBqZrQrUz8MP2/d7AhVW1E/Biuo9lurH980NVdWVVXQUcCTwLWA0s7tt/Mc1Ci4xTvxhY3a4PIkmSJElD0TWh2ogmqQF4OvCl9v2Pgft1OUFV/QG4DBgvCVpBM+EEAEk2Bh7Qlt+lvn2/AkmSJEkaoq4J1S+BPZPcD3gG8I22fCvg2ilc75PAq5JsmeTewGuBrwCnAg9PsleSDYG3AudU1fntcScAr0uyTZKtgYOA46ZwXUmSJEkauK4J1dtppju/BPhhVZ3Vlu8O/GQK13sn8L/AL4Cft8ceWlUrgb2AQ4E/ALsC+/QcdwzwZeBc4GfAaW2ZJEmSJA3Nwi47VdUpSbYFtubOs+19E/hC14tV1S3AK9tXf903gR3vclBTV8DB7UuSJEmSRsJan1AlWT/Jb4EtquonVXX7WF1VndXTLU+SJEmS5pW1JlTtU6VbGH8yCUmSJEmat7qOofoQ8MYknboISpIkSdJ80DVBehLwZODyJD8D1vRWVtVfDzowSZIkSRp1XROqq5jC5BOSJEmSNB90neXv72Y6EEmSJEmaa7qOoQIgydIkz0+ycbu9seOqJEmSJM1XnZKhJFsBXwIeSzPb34OAi4EjgZuA18xUgJIkSZI0qro+oXo/8Ftgc+CGnvKTgWcMOihJkiRJmgu6dtd7GvC0qvpDkt7yi4BtBx6VJEmSJM0BXZ9QbQT8cZzyJTRd/iRJkiRp3umaUH0HeEnPdiVZALwe+K9BByVJ0iAlOTDJ8iQ3Jzmup/xxSU5Pck2SlUlOTvJnPfXLktySZHXPa4ehfAhJ0kjqmlAdDLwsyenABsARwHnAE4E3zlBskiQNyhXAu4BP9JXfGzgW2B7YDrge+GTfPidV1aKe18UzHawkae7oug7VeUkeAbwCuBnYkGZCig9X1ZUzGJ8kSXdbVZ0CzfIfwH17yr/Wu1+So4Bvz250kqS5rPMaUlX1W+BtMxiLJEnDthuwoq/suUmuAa4Ejqqqj0x0cJIDgAMAtt3WOZskaT7oug7VbhNUFc2kFBdV1TUDi0qSpFmWZGfgrcDzeoo/R9Ml8HfArsAXklxbVSeOd46qOrbdn6VLl9bMRixJGgVdn1CdSZM8AYzNm967fXuSLwEvqqo1gwtPkqSZl+SBwNeA11TVd8fKq+q8nt3+O8kHgL2BcRMqSdL803VSimcDPwdeCDywfb2QplvEXu3rkcB7Bh+iJEkzJ8l2wDeBd1bVp9aye3HHjUVJkjo/oXoXzV273inSL06yEjisqh6T5DbgQ8CrBh2kJEl3R5KFNG3eAmBBkg2BW4GtgG/RTLJ09DjHPY9m6ZBrgccCrwbeNEthS5LmgK4J1cOAy8cpv7ytAzgXuM8ggpIkacDewp0nVnoh8HaaJ047AG9L8qf6qlrUvt2HZqr1DYDLaG4iHj8rEUuS5oSuXf7OA96cZIOxgvb9m9o6gPsBv53sJEnOTHJTz+KIF/TUPS3J+UluSHJG2wVjrC5JDktydfs6PIldLiRJnVTVsqpK32tZVb29fd+7ztSinuP2rarN2/Idq+qDw/wckqTR0zWheiWwO3B5mxSdQfN0aneatamgucP3bx3OdWBPo/UQgCRbAKcAhwCbAcuBk3qOOQDYA9gF2Bl4DvDyjrFLkiRJ0ozourDvWUnuT9NF4iE0A3JPBD4zNqtfVZ1wN+LYE1hRVScDJFkGXJVkx6o6H9gfOKKqLmvrjwBeBtylv7skSZIkzZapLOy7BjhmANd8d5L3ABcAb66qM4GdgLN7r5Xkorb8/P769v1OA4hFkiRJkqatc0KV5H7Ak4At6esqWFVHdjzN62nGXP2RZqDvl5M8ElgErOzbdxWwSft+UbvdW7coSarqTgsnukq9JEmSpNnSKaFKsh/NLEe30iQ+vUlMAZ0Sqqo6q2fz+CT7As8CVgOL+3ZfDFzfvu+vXwys7k+m2mu4Sr0kSZKkWdF1Uop3AEcAi6tq+6q6f89rh7tx/bEFElfQTDgBQJKNgQe05fTXt+9XIEmSJElD1DWh2gr4WFXdNt0LJdk0ye5JNkyysH3qtRvwdeBU4OFJ9moXW3wrcE47IQXACcDrkmyTZGvgIOC46cYiSZIkSYPQdQzVV4FdgYvvxrXWB94F7AjcRjPZxB5VdQFAkr2Ao4BPA2fRjLEacwzNtOznttsfYzATZEiSJEnStHVNqE4HDkuyE01Sc0tvZVWdsrYTVNVK4LGT1H+TJtkar66Ag9uXJEmSJI2ErgnV2NOgN41TV8CCwYQjSZIkSXNH14V9u461kiRJkqR5w0RJkiRJkqapU0KVxiuTrEhyQ5Id2vI3JPnbmQ1RkiRJkkZT1ydUrwHeQrNgbnrKLwcOHHRQkiRJkjQXdE2o/hF4WVV9ALi1p/zHwE4Dj0qSJEmS5oCus/xtB/xsnPJbgI0GF44kSZI087Z/w2nDDmFeuuQ9zx52CAPX9QnVxcCjxyl/FnDe4MKRJEmSpLmj6xOq9wFHJbknzRiqxyd5Ec1Cu38/U8FJkiRJ0ijrug7VJ5MsBP4FuCfwKZoJKV5dVSfNYHySJEmSNLK6PqGiqj4KfDTJFsB6VfX7mQtLkiRJkkZf13Wo1kuyHkBVXQWsl+SlSZ4wo9FJkiRJ0gjrOinFacCrAJIsApYD7wW+neTFMxSbJEmSJI20rgnVY4Bvte/3BK4DtgReBvzTDMQlSdLAJDkwyfIkNyc5rq/uaUnOT3JDkjOSbNdTlySHJbm6fR2eJHe5gCRp3uqaUG0CXNu+fwZwalXdQpNkPWAG4pIkaZCuAN4FfKK3sB0XfApwCLAZTQ+M3smWDgD2AHYBdgaeA7x85sOVJM0VXROqXwNPTLIxsDtwelu+GXDDTAQmSdKgVNUpVfVF4Oq+qj2BFVV1clXdBCwDdkmyY1u/P3BEVV1WVZcDRwAvmZ2oJUlzQdeE6kiaqdIvo5ku/Ttt+W7AuTMQlyRJs2En4OyxjapaA1zUlt+lvn2/ExNIckDbtXD5ypUrZyBcSdKo6ZRQVdUxwONpFvH9i6q6va26iKabhCRJc9EiYFVf2Sqaru7j1a8CFk00jqqqjq2qpVW1dMmSJQMPVpI0eqayDtVymr7lACRZv6pOm5GoJEmaHauBxX1li4HrJ6hfDKyuqpqF2CRJc0DXdahenWSvnu2PAzcmuSDJQ2YsOkmSZtYKmgknAGjHCj+gLb9Lfft+BZIktbqOoXo1sBIgyW7A3wIvAH5KM0BXkqSRlWRhkg2BBcCCJBsmWQicCjw8yV5t/VuBc6rq/PbQE4DXJdkmydbAQcBxQ/gIkqQR1TWh2ga4pH3/XODkqvoczWxIj5vqRZM8KMlNST7dU+Y6IJKkmfIW4EbgDcAL2/dvqaqVwF7AocAfgF2BfXqOOwb4Ms0ETD+jWej+mNkLW5I06romVNcBY6Nr/wr4r/b9LcCG07juh4H/HdtwHRBJ0kyqqmVVlb7Xsrbum1W1Y1VtVFVPqapLeo6rqjq4qjZrXwc7fkqS1KtrQvUN4KPt2KkHAl9ry3cCfjWVCybZh2aR4P/qKXYdEEmSJElzTteE6v8B3we2APauqmva8kcDJ3a9WJLFwDto+qD3Gtg6IK4BIkmSJGm2dJo2vaquA141Tvnbpni9dwIfr6rf9A2BWkQ76UWPTuuA9He9qKpjgWMBli5darcMSZIkSTOm8zpUY5LcB7hHb1lV/brDcY8Eng48apxq1wGRJEmSNOd0SqiS3Av4IM106fcYZ5cFHU7zFGB74Nft06lFNFPXPgw4mmac1Nj1JloH5H/abdcBkdYR27/B9cFn2yXvefawQ5AkaZ3RdQzV+2iSmD2Am2jWoPpn4DLg+R3PcSxNkvTI9nU0zfSzu+M6IJIkSZLmoK5d/p4J7FtV301yG/CjqjopyZU005d/fm0nqKobgBvGtpOsBm5q1wAhyV7AUcCngbO46zogO9CsAwLwMVwHRJIkSdKQdU2oNgUubd+vAjYHLgR+QJPcTNnY+h89298Edpxg3wIObl+SJEmSNBK6dvm7iOYJEcDPgX3SDITaE7hmwqMkSZIkaR3WNaE6Dti5ff8emm5+fwTeCxw2+LAkSZIkafR1XYfq/T3vv5XkocBjgF9W1bkTHylJkiRJ664pr0MFUFWXcseYKkmSJEmal7p2+SPJHkm+k+Sq9vXdJH8zk8FJkiRJ0ijrlFAlOQg4CbiAO2bbOx/4bJJ/mrnwJEmSJGl0de3y90/AgVX10Z6yTyT5H+AdNAv/SpIkSdK80rXL3yLgjHHKz2jrJEmSJGne6ZpQfRHYe5zyvYAvDSwaSZIkSZpDunb5uxB4Q5KnAj9oyx7Xvo5M8rqxHavqyMGGKEmSJEmjqWtC9RLgD8CD29eYPwB/17NdgAmVJEmSpHmh68K+95/pQCRJkiRprum8DpUkSZIk6c5MqCRJkiRpmkyoJEnzWpLVfa/bknyords+SfXVHzLsmCVJo6PrpBSSJK2TqupP6ykm2Rj4HXBy326bVtWtsxqYJGlOmPAJVZJPJNmkfb9bEpMvSdK6bm/g98B3hx2IJGlumKzL3wuBjdv3ZwCbzXw4kiQN1f7ACVVVfeWXJrksySeTbDGMwCRJo2myp06XAK9K8g0gwOOT/GG8HavqOzMQmyRJsybJtsCTgX/oKb4KeCzwU2Bz4MPAZ4DdJzjHAcABANtuu+0MRitJGhWTJVT/DHwUeCPNgr2nTrBfAQsGHJckSbPtxcD3qupXYwVVtRpY3m7+LsmBwJVJFlfVdf0nqKpjgWMBli5d2v+US5K0Dpqwy19V/UdVbUnT1S/ATsCScV5bdr1Ykk8nuTLJdUl+keSlPXVPS3J+khuSnJFku566JDksydXt6/AkmfKnlSRpYi8Gjl/LPmNJkm2QJAnoMMtfVV2b5KnALwcww9G7gX+oqpuT7AicmeQnwKXAKcBLgS8D7wROAh7XHncAsAewC01jdjpwMXD03YxHkiSSPAHYhr7Z/ZLsClwL/BK4N/BB4MyqWjXbMUqSRlOnmfuq6ttJNkjyYuBhNEnNecBnq+rmrherqhW9m+3rAcBjgBVVdTJAkmXAVUl2rKrzaQYJH1FVl7X1RwAvw4RKkjQY+wOnVNX1feU7AP9C0xvjOpobevvOcmySpBHWaWHfJA8DfgEcCexK8+To/cAvkjx0KhdM8m9JbgDOB64EvkrTnfDssX2qag1wUVtOf337fifGkeSAJMuTLF+5cuVUQpMkzVNV9fKqetE45SdW1f2rauOq+rOqenFV/XYYMUqSRlOnhAr4AM0MR9tW1ZOq6knAtjSJzb9O5YJV9UpgE+BJNN38bgYWAf3dJ1a1+zFO/Spg0XjjqKrq2KpaWlVLlyxZMpXQJEmSJGlKui7W+0Tgsb0zGlXVdUneDPxwqhetqtuA7yV5IfAKYDWwuG+3xcBY14v++sXA6nHWCZEkSZKkWdP1CdVNwKbjlN+rrZuuhTRjqFbQTDgBQJKNe8rpr2/f947HkiRJkqRZ1zWh+jLw0SRPTLKgff0FcAzwpS4nSLJlkn2SLGqP351mYO+3aNa4eniSvZJsCLwVOKedkALgBOB1SbZJsjVwEHBc508pSZIkSTOga5e/19CszfFd4La2bD2aZOq1Hc9RNN37jm6PvRR4bVX9B0CSvYCjgE8DZwH79Bx7DM1MS+e22x9ryyRJkiRpaLpOm34t8LwkDwQeSrOg4XlVdWHXC1XVSuDJk9R/E9hxgroCDm5fkiRJkjQSuj6hAqBNoDonUZIkSZK0Lus6hkqSJEmS1MeESpIkSZKmyYRKkiRJkqZprQlVkoVJXtlOVy5JkiRJaq01oaqqW4H3AuvPfDiSJEmSNHd07fL3Q+DRMxmIJEmSJM01XadN/yhwRJLtgB8Ba3orq+rHgw5MkiRJkkZd14Tqs+2fR45TV8CCwYQjSZIkSXNH14Tq/jMahSRJkiTNQZ0Sqqq6dKYDkSRJkqS5pvM6VEmemeQrSc5Lcr+27KVJnjZz4UmSJEnS6OqUUCXZD/gc8Eua7n9jU6gvAA6emdAkSZIkabR1fUJ1MPCyqvr/gFt7yn8IPHLQQUmSJEnSXNA1oXoQ8INxylcDiwcXjiRJkiTNHV0TqiuAB49Tvhtw0eDCkSRJkqS5o2tCdSzwwSRPbLfvl2R/4HDgIzMSmSRJsyTJmUluSrK6fV3QU/e0JOcnuSHJGe0i95IkAR0Tqqo6HDgFOB3YGDgDOBo4uqo+PHPhSZI0aw6sqkXt6yEASbagaf8OATYDlgMnDTFGSdKI6bqwL1X15iSHAg+jScTOq6rVMxaZJEnDtyewoqpOBkiyDLgqyY5Vdf5QI5MkjYTO61C1CrgJuAG4bfDhSJI0NO9OclWS7yd5Slu2E3D22A5VtYZm7PBOsx+eJGkUdV2HaoMk/wpcQ9OwnANck+QDSTacwjk+nuTSJNcn+UmSZ/bUT9hHPY3Dklzdvg5Pkil9UkmSJvZ6YAdgG5pxw19O8gBgEbCqb99VwCbjnSTJAUmWJ1m+cuXKmYxXkjQiuj6h+giwN/BSminUH9i+/xvg3zqeYyHwG+DJwL1o+qN/Lsn2HfqoHwDsAewC7Aw8B3h5x+tKkjSpqjqrqq6vqpur6njg+8CzGH95kMXA9ROc59iqWlpVS5csWTKzQUuSRkLXMVT/F9izqk7vKbs4ye+BLwB/v7YTtN0klvUUfSXJr4DHAJszeR/1/YEjquqytv4I4GU0E2NIkjRoBQRYQdMGAZBkY+ABbbkkSZ2fUK0BLh+n/HLgxulcOMlWNGtbrWDtfdTvVN++t/+6JOluS7Jpkt2TbJhkYZL9aNZZ/DpwKvDwJHu1XdzfCpzjhBSSpDFdE6oPAW9LstFYQfv+kLZuSpKsD3wGOL5tlNbWR72/fhWwaLxxVPZflyRN0frAu4CVwFXAq4A9quqCqloJ7AUcCvwB2BXYZ1iBSpJGz4Rd/pJ8qa/oKcDlSc5ptx/RHr/xVC6YZD3gU8AfgQPb4rX1Ue+vXwysrqrqP39VHUszoJilS5fepV6SpF5t0vTYSeq/Cew4exFJkuaSycZQXd23/YW+7V9N9WLtE6WPA1sBz6qqW9qqtfVRX0EzIcX/tNu7YP91SZIkSUM2YUJVVX83A9f7CPBQ4OlV1Tv26lTgvUn2Ak7jrn3UTwBel+SrNAOFD2IaXQ0lSZIkaZCmurDvtLXrSr0ceCTw2ySr29d+HfqoHwN8GTgX+BlN0nXMbMUuSZIkSePpNG16knvTTHn+VGBL+hKxqtpybeeoqktppqCdqH7CPurtWKmD25ckSZIkjYSu61CdQDNN+fHA72i63UmSJEnSvNY1oXoK8OSq+vEMxiJJkiRJc0rXMVQXTWFfSZIkSZoXuiZJrwHenWSXJAtmMiBJkiRJmiu6dvm7ENgI+DFAs5zUHarKJEuSJEnSvNM1oToRuBfwapyUQpIkSZKA7gnVUuDPq+pnMxmMJEmSJM0lXcdQnQcsnslAJEmSJGmu6ZpQvQU4MsnTk2yVZLPe10wGKEmSJEmjqmuXv6+2f36DO4+fSrvtpBSSJEmS5p2uCdVTZzQKSZIkSZqDOiVUVfXtmQ5EkiRJkuaaTglVkkdPVl9VPx5MOJIkSZI0d3Tt8recZqxU74q+vWOpHEMlSZIkad7pmlDdv297feBRwJuBNw40IkmSJEmaI7qOobp0nOILk6wC3gZ8baBRSZIkSdIc0HUdqon8CnjkAOKQJEmSpDmn66QU/Yv3BvgzYBlwwYBjkiRJkqQ5oesYqqu48yQU0CRVvwGeP9CIJEmSJGmOmO7CvrcDK4ELq+rWwYYkSdLsSbIB8G/A04HNgAuBN1XV15JsT9O9fU3PIYdV1TtnPVBJ0khyYV9J0ny3kKbHxZOBXwPPAj6X5BE9+2zqDURJ0ngmnZQiyWZdXl0vluTAJMuT3JzkuL66pyU5P8kNSc5Isl1PXZIcluTq9nV4ktzlApIkTVFVramqZVV1SVXdXlVfoXkq9ZhhxyZJGn1rm+XvKpqufZO9fj+F610BvAv4RG9hki2AU4BDaLpbLAdO6tnlAGAPYBdgZ+A5wMuncF1JkjpJshXwYGBFT/GlSS5L8sm2zZro2APaG4fLV65cOeOxSpKGb21d/vrHTvX6P8BrgM5dIKrqFIAkS4H79lTtCayoqpPb+mXAVUl2rKrzgf2BI6rqsrb+COBlwNFdry1J0tokWR/4DHB8VZ2fZBHwWOCnwObAh9v63cc7vqqOBY4FWLp0af9kTpKkddCkCdV4Y6eSPBo4DNgNOAYYxMDcnYCze667JslFbfn5/fXt+53GO1GSA2ieaLHtttsOIDRJ0nyQZD3gU8AfgQMBqmo1Ta8JgN8lORC4MsniqrpuOJFKkkZJ54V9k9w/yWeBs4BrgIdV1aurahB9GhYBq/rKVgGbTFC/Clg03jiqqjq2qpZW1dIlS5YMIDRJ0rqubU8+DmwF7FVVt0yw69hTJ8fxSpKADglVks2TfIDmSdF9gMdX1fOr6qIBxrEaWNxXthi4foL6xcDqqrI7hSRpED4CPBR4blXdOFaYZNckD0myXpLNgQ8CZ1ZV/01ASdI8tbZZ/t4EXEQzlezzquovq2r5ZMdM0wqaCSfGrrsx8ADuGBB8p/r2fe9gYUmSpqWdVfblwCOB3yZZ3b72A3YA/pPmBt/PgJuBfYcVqyRp9KxtUop3ATcClwGvTPLK8Xaqqr/ucrEkC9trLgAWJNmQZlKLU4H3JtkLOA14K3BOOyEFwAnA65J8laa7xUHAh7pcU5KkyVTVpUzehe/E2YpFkjT3rC2hOoE7+osPwluAt/VsvxB4e1Uta5Opo4BP04zT2qdnv2No7hKe225/rC2TJEmSpKFZ2yx/LxnkxapqGbBsgrpvAjtOUFfAwe1LkiRJkkZC51n+JEmSJEl3ZkIlSZIkSdNkQiVJkiRJ02RCJUmSJEnTZEIlSZIkSdNkQiVJkiRJ02RCJUmSJEnTZEIlSZIkSdNkQiVJkiRJ02RCJUmSJEnTZEIlSZIkSdNkQiVJkiRJ02RCJUmSJEnTZEIlSZIkSdNkQiVJkiRJ02RCJUmSJEnTZEIlSZIkSdNkQiVJkiRJ02RCJUmSJEnTZEIlSZIkSdM0ZxKqJJslOTXJmiSXJnnBsGOSJM0PtkGSpIksHHYAU/Bh4I/AVsAjgdOSnF1VK4YalSRpPrANkiSNa048oUqyMbAXcEhVra6q7wFfAl403MgkSes62yBJ0mTmREIFPBi4rap+0VN2NrDTkOKRJM0ftkGSpAnNlS5/i4BVfWWrgE36d0xyAHBAu7k6yQUzHJvubAvgqmEHMR05bNgRaI7x3/pwbDeEa9oGzR3+v9R84b/12Tdh+zNXEqrVwOK+ssXA9f07VtWxwLGzEZTuKsnyqlo67Dikmea/9XnFNmiO8P+l5gv/rY+WudLl7xfAwiQP6inbBXAwsCRpptkGSZImNCcSqqpaA5wCvCPJxkmeCDwP+NRwI5MkretsgyRJk5kTCVXrlcBGwO+BE4FXOF3tSLKri+YL/63PL7ZBc4P/LzVf+G99hKSqhh2DJEmSJM1Jc+kJlSRJkiSNFBMqSZIkSZomEypJkiRJmiYTKknqIMkGSQ5NcnGSVW3ZM5IcOOzYJEnrLtuf0WdCpYFI8tAkhyT5cLu9Y5Kdhx2XNEDvBx4O7AeMzeazAnjF0CKSBNgGaZ1n+zPiTKh0tyX5v8C3gW2AF7XFi4AjhxaUNHh/A7ygqn4A3A5QVZfT/LuXNCS2QZoHbH9GnAmVBuEdwDOq6h+B29qys4FdhheSNHB/BBb2FiRZAlw9nHAktWyDtK6z/RlxJlQahC1pGi+441F09byX1gUnA8cnuT9Akj8DjgL+fahRSbIN0rrO9mfEmVBpEH7EHd0sxuwD/M8QYpFmypuAS4BzgU2BXwJXAG8fXkiSsA3Sus/2Z8Slyhs4unuS7Ah8A/gV8DjgTODBNF0wfjnE0KQZ0Xa1uKr8ApWGzjZI84ntz2gyodJAJLkn8BxgO+A3wFeqavVwo5IGJ8kOE9VV1cWzGYukO7MN0rrM9mf0mVBp4Nr/+LdV1aXDjkUalCS304zJSE9xAVTVgqEEJekubIO0rrH9GX2OodLdluTEJE9o3/8dzdoI5yX5h+FGJg1OVa1XVQvaP9cDtgaO5a5jNyTNItsgretsf0afT6h0tyX5PXDfqvpjknOBfwSuBb5YVQ8aanDSDEqyAfCLqtpu2LFI85VtkOYj25/RsnDtu0hrdY+2IdsG2Kyqvg+QZKshxyXNtIcA9xx2ENI8Zxuk+cj2Z4SYUGkQfprkjTSDgU8DaBu264YalTRASb7Lnde1uSewE82iopKGxzZI6zTbn9FnQqVB+AfgncAtwD+3ZY8HPjO0iKTB+1jf9hrgbKdllobONkjrOtufEecYKklaiyQLgE8AB1TVzcOOR5I0P9j+zA0mVJqWJH/fZb+q+sRMxyLNhiRXAttW1S3DjkWa72yDNJ/Y/ow+EypNS5IzOuxWVfWXMx6MNAuSHAxsCrzNRk0aLtsgzSe2P6PPhEqSJpFk36o6MclvgPsAtwEr6RkgXFXbDis+SdK6yfZn7jCh0kAlCT0reVfV7UMMR7rbklxXVYuTPHmifarq27MZk6Tx2QZpXWL7M3eYUOlua6enPQrYjeaR9J9U1YJhxCQNSpLrq2qTYcchaXy2QVpX2f7MHU6brkE4GrgBeBrwbZpGbRnw1SHGJA3KgiRPpeeud7+q+tYsxiPpzmyDtK6y/ZkjfEKluy3J1TSzz6xJcm1VbZpkM+C/q2rHYccn3R1JbgMuZeIGrapqh1kMSVIP2yCtq2x/5g6fUGkQbgNubd9fm2QJzQr12wwvJGlg1thgSSPNNkjrKtufOWK9YQeguSvJfdq3ZwHPat9/HTgJOAVYPoy4JEnrPtsgSaPCLn+atp7ZZzalSc4/BuwH/BOwCPjXqrpyiCFKd5uDgqXRZBukdZ3tz9xhQqVp6/+PnuSaqtpsmDFJkuYH2yBJo8Iuf7o7zMYlScNiGyRpJDgphe6OhX3TefZvO52nJGmm2AZJGgl2+dO0JbmEye8QOp2nJGlG2AZJGhUmVJIkSZI0TY6hkiRJkqRpMqGSJEmSpGkyoZIkSZKkaTKhkiRJkqRpMqGSJEmSpGn6/wFt3g+8+RnihAAAAABJRU5ErkJggg==",
      "text/plain": [
       "<Figure size 864x288 with 2 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Inspect the target variable\n",
    "train_survived_value_counts = df_train.survived.value_counts()\n",
    "test_survived_value_counts = df_test.survived.value_counts()\n",
    "\n",
    "\n",
    "plt.figure(figsize=(12, 4))\n",
    "\n",
    "plt.subplot(121)\n",
    "train_survived_value_counts.plot.bar()\n",
    "train_sex_ratio = train_survived_value_counts[True]/train_survived_value_counts[False]\n",
    "plt.title(f'Train set: survivied ratio: {train_sex_ratio:.2f}')\n",
    "plt.ylabel('Number of passengers')\n",
    "\n",
    "plt.subplot(122)\n",
    "test_survived_value_counts.plot.bar()\n",
    "test_sex_ratio = test_survived_value_counts[True]/test_survived_value_counts[False]\n",
    "plt.title(f'Test set: surived ratio: {test_sex_ratio:.2f}')\n",
    "\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next up, let's check whether the ratio of male to female passengers is not too dissimilar between the two sets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:38.073343Z",
     "start_time": "2020-05-01T17:12:37.733604Z"
    }
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAA1QAAAEUCAYAAAAspncYAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAA1i0lEQVR4nO3debhkVXnv8e8PmgjSNAi0RFAacUJRcWjFEVQMRqORgFdxAk0QEi8OV4yiIqKCiopGQWVwAFQIopCI4IBxnm01iI2gjDLbTA3NJOB7/9j7QFGcYZ/T53TV6fP9PM9+etdae3irurveWnutvXaqCkmSJEnS5K0x6AAkSZIkabayQSVJkiRJU2SDSpIkSZKmyAaVJEmSJE2RDSpJkiRJmiIbVJIkSZI0RTaoNKOSfD3J7oOOY1VJ8owklww4hn9LcmWSFUk2WoXnfVWSH62q841y/rcn+fSgzi9Jg5DkgCRfGOD5k+RzSa5N8otVfO6jkxy4Ks/Zd/459RtHY7NBpXtof4iPLH9NcnPP65dP5lhV9dyqOmamYu2XZIsklWTeqjrnMEmyFvARYMeqml9VVw86ppkwWsO1qt5XVXtM0/Hfm+TMJLcnOWCCbe+V5PC2EXtNklOSbDaVY0m6y3TmovZ430syLd8Rfccd6MWkIfA04O+A+1fVEwcdzEwZreE6Xb9xkvxNki8nubD9DfOMCbZf0bfckeTQnvo9kpzb1n0jyaYrG6PGZ4NK99D+EJ9fVfOBPwEv6Cn74sh2c7XRMuQ2AdYGlg46kKlqr3YO+rvpXOAtwKkdtn0D8GTg0cCmwHXAoT31kzmWpFbXXKSBWwRcWFU3DjqQqRqS3zM/Al4BXDHRhn3/NzYBbgZOBEiyPfA+4IXAhsAFwPEzFbQag/7RollkpFcgyVuTXAF8Lsl9knwtybK2u/9rSe7fs8+dVwRHruIl+XC77QVJnjvO+d6a5NIkNyQ5J8kObfkaSfZNcl6Sq5N8KcmG7W4/aP+8rr0y8+QO7+uAJCcm+UJ7rjOTPDTJ25L8OcnFSXbs2f7VSX7fbnt+kr3GOfamSb7Sfj4XJHn9GNs9KckVSdbsKfunJL9t15+YZEmS69uekI+McoyHAuf0vP/vtOVbJTm97T05J8mLe/Y5Oskn0wxbWJHkx0n+Nsl/tH9HZyd5bM/2I5/7DUnOSvJP47z3Mc87yrbfS3JQkh8DNwFbjvU5J1kX+Dqwac/VuU3Td/UwyT8mWZrkuvb4Dx/r/P2q6piq+jpwQ4fNHwh8s6qurKpbgP8Etp7isSRNYLwckGTt9rv86vb//i+TbJLkIODpwGHtd8Zhoxx31H3buvWTfCbJ5W1eOjDJmu33yuHAk9vjXtfxPXyvPcZP2v1OSbJRki+23/O/TLJFz/Yfa3PR9Ul+leTp4xz7Se1xr0tyRsbo7Wg/wy/3lX0sycfb9Ve13703tPnrHr2CSf4F+HTP+393W/78JP/bxvCTJI/u2efCJP+e5LdJbmw/103aPHRDkm8nuU/P9iemyY/Lk/wgydb9cfRsO+Z5R9m2kvzfJH8E/tjz/u/xOSf5e+DtwEva93lGW977G2eNJPsluSjNb4djk6w/1vl7VdVfquo/qupHwB1d9unxIuDPwA/b1y8ATqyqpVX1F+C9wHZJHjTJ42oSbFBpsv6W5orHImBPmn9Dn2tfb05zleQeiarHtjQ/+jcGPgh8Jkn6N0ryMGBv4AlVtR7wHODCtvr1wE7A9jQ9AtcCn2jrtmv/3KC9evPTJJu3X66bjxPXC4DPA/cBfgN8s31vmwHvAY7o2fbPwPOBBcCrgY8medwo72EN4BTgjPY4OwBvTPKc/m2r6mfAjcCzeopfBhzXrn8M+FhVLQAeBHxplGP8gbt+yG9QVc9K0/g4vT3OfYGXAp/sS0gvBvaj+Tu5Ffgp8Ov29ZdphhCOOI/mR8n6wLuBLyS53yjvvct5+72S5t/UesBFjPE5t1dBnwtc1nOV7rK+8z+U5orcG4GFwGnAKUn+pq3/ZJJPjhPLZHwGeGqaRt29gZfTNPgkzYzxcsDuNN9PDwA2Av4VuLmq3kHzg3Pv9jtj71GOO+q+bd0xwO3Ag4HHAjsCe1TV79vtftoedwOAJC9Le0FsHLvSfO9tRvO9/lOafLoh8HvgXT3b/hJ4TFt3HHBikrX7D5hmuPGpwIHttm8GvpJk4SjnPx54XpIF7b5r0uSD49rv8I8Dz21z8FOA/+0/QFV9pu/9v6vNh58F9qL5HI8AvprkXj277kIzTPChNPn36zQNlo1pcm/vxcevAw+hySW/Bkbtnex43n470fwueUT7etTPuaq+QdPrc0L7PrcZ5VivapdnAlsC8+n5PdQ2IF82TixTtTtwbFXVyKnahZ7XAI+cgXOrZYNKk/VX4F1VdWtV3VxVV1fVV6rqpqq6ATiIJsmN5aKqOqqq7qBJUPej6a7udwdwL+ARSdaqqgur6ry2bi/gHVV1SVXdChwAvChjdNlX1Z+qaoOq+tM4cf2wqr5ZVbfTdJsvBD5QVbfR9DhskWSD9ninVtV51fg+8C2aRka/JwALq+o97dWn84GjaJLoaI6naXiQZD3gedzVTX8b8OAkG1fVirYB1sXzaYZifK6qbq+qXwNfobmiNeLkqvpV27tyMnBLVR3b/h2dQPPjgfa9n1hVl1XVX6vqBJqreqONme9y3n5Ht1fUbq+q2ybxOY/mJcCpVXV6+3f4YWAdmh8FVNVrq+q1HY81kT/QDEe6FLgeeDhNI1zSzBgvB9xG82P6wVV1R/vddn3H4466b5pequcCb6yqG6vqz8BHGfu7nKo6rqrG7B1pfa79jltO02g4r6q+3ZOHer97v9Dm29ur6hCa/PiwUY75CuC0qjqt/Z4+HVhCk0/6Y7yIpoGyU1v0LOCmnvzyV+CRSdapqsurqutQ8tcAR1TVz9vP8Riai3VP6tnm0LZX/1Kahu7Pq+o37d/nyX3v/bNVdUPP3/U2Y/T8dDlvv/dX1TVVdXN7rq6f82heDnykqs6vqhXA24BdR36bVNWjq+q4cY8wSe2F4u1pfk+NOA14cZJHJ1kH2B8o4N7TeW7dnQ0qTday9oc3AEnuneSItov7epohdxukZ+hanzvHBlfVTe3q/P6Nqupcmt6FA4A/J/nP3HVT5SLg5LbX6TqaK3l3MHrDrKsre9ZvBq5qGxQjr++MM8lzk/wszVC262gS1cajHHMRzbC063piffs4cR4H7NxeTdsZ+HWb8AD+heZK3tlphoI8v+P7WgRs2xfDy2l6Gsd67/2v7/z7SbJbz3CK62iueI313ic6b7+Le19M4nMezaY0vVwAVNVf2+NvNuYeU/cpmvvWNgLWBU7CHippJo2XAz5PM8LgP5NcluSDaSbr6WKsfRcBawGX95zzCJoek5Uxme/efdIMgV7enn99xv7u/T99371Po7l4OZrjaC/k0TMqopqRAC+h6X26PMmpSbbq+L4WAfv0xfAAmu/lEZ3ee5phlR9IM7zzeu4aqTLWe5/ovP36807Xz3k0d8s77fo8Vu63yUR2A35UVReMFFTV/9D0bn6ljeFCmiHnA52BeHVng0qTVX2v96G5erNtNcPRRobc3WMY36RP1FzhexrNl2QBB7dVF9MMQ9igZ1m7vdLVH9+0ahs7X6Hp8dikmuEdpzH6+70YuKAvzvWq6h5XCgGq6iyaL7/ncvfhflTVH6vqpTQJ/GDgy+2QjIlcDHy/L4b5VfVvnd90K8kimh62vYGN2vf+O8Z+75M9751/dx0+54n+ni+j+XczcrzQJNZLJ9hvKrah6V27pr2CeijwxCRdk7CkyRkzB7S92++uqkfQ9Eg/n+ZHJ0zwvTHOvhfT9HRs3HO+BVU1MoR5pvPO04G30gzHu0/7fbicsb97P9/32axbVR8Y4/AnAs9Ic+/zP3H3vPPNqvo7msbY2TTf/11cDBzUF8O9q2oqEyO8jGZyhWfTNG62aMvHeu+TPW9v3pnoc55U3qG5DeJ27t5YnG67cffeKQCq6hNV9ZCqui9NLp1Hk681Q2xQaWWtR3M16bo0NwW/a4LtO0nysCTPan9Y39KeY6TH6HDgoPYHPkkWJnlhW7eMZpjCltMRxyj+hmYIwDLg9jSTauw4xra/AK5PM7nGOu2VtkcmecI4xz+OZuz4drQz9gAkeUWShW1Py3VtcZcbV78GPDTJK5Os1S5PyCQmaOixLk1CWdbG9GrGHpO9sued6HO+EthojGEf0Nxj9g9JdmivMO9D84PoJ11O3sa7Ns135Lw0N6uP1ev6S2C3NDetrwW8lub+rqumcCxJExszByR5ZpJHtf/HrqcZxjfyXXkl4+SGsfatqstphhwfkmRBmskHHpRmNrWR494/7T2aM2A9mh/my2i+Q/anubd0NF8AXpDkOW3OWTvNhFL3H23jqloGfI/m3q0LqrknjDSTRPxje+HuVmAF3SdLOAr41yTbprFukn9IM5R9stZrz381zZC1983geSf6nK+kGf4/1m/n44H/l+SBSeZz1z1Xt3c5eZpHcIzcF/c37d/dmBenkzyFZtTFiX3la7e/NZJmSOCRNPdgX9slDk2NDSqtrP+guTflKuBnwDem6bj3Aj7QHvcKmp6Zt7d1HwO+CnwryQ3tebeFO4cRHgT8uO3yf1KaSSlWZPxJKTqp5j6x19P8YL+W5urZV8fY9g6am20fQzNt6VU0syGNN+vP8cAzgO+M/CBv/T2wNMkKmve/a+/Qywni3ZFmrP9lNJ/lwTSf76S0PWiH0Nw4fSXwKODHM3HeiT7nqjqb5rM6v/173rRv/3No7iU4lOZzfwHNlMt/AUjz3KjDxwnhKJpG/EuBd7Trr2z3fXr79zDizTSN/j/SJOLn0VzpnfBYkqZkzBxAM6z4yzQNot8D36dpZIzs96I0M5h+fJTjjrfvbjQXes6i+U76MncNo/sOzaMqrkgyciHl5Umm6/EV36QZRvwHmlEMt9A3VG1EVV1M06Pzdprvo4uBf2f833vH0fQA9d7fswbNhajLgGto7tPpdN9pVS2huZ/pMJrP6lyayRqm4lia93wpzWc/5v3D03DeiT7nkYbL1Ul+Pcr+n6UZNvoDmpx/C/C6kco0s86O9/y0c2jyw2ZtLDfT9nileXB9/1Dy3YGT2nzZa22av8sVNBd2fwq8c5zzahqkakZ7qiVJkiRptWUPlSRJkiRNkQ0qSZIkSZoiG1SSJEmSNEU2qCRJq70keydZkuTWJEf3lL+8nbRmZLkpSSV5fFt/QJLb+raZqVlEJUmz0CprUPUloxVJ7khyaE/9DknObpPZd0emQ23rkuTgJFe3ywfHm0pSkqQ+lwEH0szEdaeq+mL7jLT5VTWfZiaz84HeWbxO6N2mqs5fdWFLkobdvFV1ojZRAdA+1+BK2iko0zwA8yRgD+AU4L3ACcCT2l32BHaieYBmAafTJLzxpj1m4403ri222GIa34UkaZB+9atfXVVVCye7X1WdBJBkMTDqM3lauwPH1jRMgWsOkqTVx3j5Z5U1qPq8CPgz8MP29c7A0qoaaWAdAFyVZKv2eTO7A4dU1SVt/SE0zxoYt0G1xRZbsGTJkpl5B5KkVS7JRTN47EU0D9X+576qFyS5BrgcOKyqPjXOMfakuQjI5ptvbg6SpNXEePlnUPdQ9V8B3Bo4Y6Syqm4EzmvL71Hfrm/NKJLs2Y6TX7Js2bJpD1yStNraDfhhVV3QU/Yl4OHAQpoLefsneelYB6iqI6tqcVUtXrhw0h1pkqRZaJU3qJJsTvPE7WN6iucDy/s2XQ6sN0b9cmD+aPdRmcwkSVO0G3fPTVTVWVV1WVXdUVU/AT5GM8pCkiRgMD1UuwE/6rsCuAJY0LfdAuCGMeoXACumY4y7JElJngpsCnx5gk0LcFIkSdKdBtWgOqavbCnNhBPAnZNWPKgtv0d9u74USZI6SDIvydrAmsCaSdZO0nsf8e7AV6rqhr79XpjkPu1ss08EXg/896qLXJI07FZpgyrJU4DNaGf363Ey8Mgku7QJb3/gt+2EFADHAm9KslmSTYF9gKNXUdiSpNlvP+BmYF/gFe36fgBt3nkx97zYB7ArcC7NiIljgYOrarTtJElz1Kqe5W934KT+K4BVtSzJLsBhwBeAn9MksRFHAFsCZ7avP92WSZI0oao6ADhgjLpbgA3GqBtzAgpJkmAVN6iqaq9x6r4NbDVGXQFvaRdJkiRJGgqDeg6VJrDFvqcOOoQ558IP/MOgQ5CkgTP/DIY5SJq9BvUcKkmSJEma9WxQSZIkSdIU2aCSJEmSpCmyQSVJkiRJU2SDSpIkSZKmyAaVJEmSJE2RDSpJkiRJmiIbVJIkSZI0RTaoJEmSJGmKbFBJkiRJ0hTZoJIkSZKkKerUoEqyMMnCntePSnJgkpfOXGiSJEmSNNy69lB9CXgBQJKNgR8A/wQcnmSfGYpNkiRJkoZa1wbVo4GftesvAs6tqq2B3YC9ZiIwSZIkSRp2XRtU6wAr2vVnA19t138NPGC6g5IkSZKk2aBrg+qPwM5JHgDsCHyrLd8EuG4G4pIkSZKkode1QfVu4GDgQuBnVfXztvw5wG8mc8Ikuyb5fZIbk5yX5Olt+Q5Jzk5yU5LvJlnUs0+SHJzk6nb5YJJM5rySpLkryd5JliS5NcnRPeVbJKkkK3qWd/bUm38kSeOa12WjqjopyebApsAZPVXfBr7S9WRJ/o6mYfYS4BfA/dryjYGTgD2AU4D3AicAT2p33RPYCdgGKOB04Hzg8K7nliTNaZcBB9JcCFxnlPoNqur2UcrNP5KkcU3YQ5VkrSRXABtX1W+q6q8jdVX186o6exLnezfwnqr6WVX9taourapLgZ2BpVV1YlXdAhwAbJNkq3a/3YFDquqSdvtDgFdN4rySpDmsqk6qqv8Crp7kruYfSdK4JmxQVdVtwG00V+amLMmawGJgYZJzk1yS5LAk6wBb09PzVVU3Aue15fTXt+tbI0nS9LiozUufa0dNjJhU/kmyZzu0cMmyZctmKlZJ0hDpeg/VocDbknQaIjiGTYC1aKZdfzrwGOCxwH7AfGB53/bLgfXa9f765cD80caxm8wkSZNwFfAEYBHweJq888We+s75B6CqjqyqxVW1eOHChTMUsiRpmHRtID0d2B64NMnvgBt7K6vqHzsc4+b2z0Or6nKAJB+haVD9AFjQt/0C4IZ2fUVf/QJgRVXdo9esqo4EjgRYvHjxSvWqSZJWb1W1AljSvrwyyd7A5UkWVNX1TCL/SJLmpq4NqquYxOQTo6mqa5NcwuhDB5fSjFMHIMm6wIPa8pH6bWgmsqBdX4okSdNrJEeN9ECZfyRJ4+o6y9+rp+l8nwNel+QbNPdlvRH4GnAy8KEkuwCnAvsDv+2Z8OJY4E1JTqNJdvvQDEOUJGlC7ZD1ecCawJpJ1gZupxnmdx3N8xbvA3wc+F5VjQzzM/9IksbV9R4qAJIsTvKStgeJJOtO8r6q9wK/BP4A/J7mGVYHVdUyYBfgIOBaYFtg1579jqCZTv1M4Hc0ja4jJhO7JGlO249m6Pm+wCva9f2ALYFv0Awx/x1wK/DSnv3MP5KkcXVqDCXZBPgqzY27BTyE5jkcHwFuAd7Q5TjtjIGvbZf+um8DW91jp6augLe0iyRJk1JVB9A8kmM0x4+zn/lHkjSurj1UHwWuADYCbuopPxHYcbqDkiRJkqTZoOtwvR2AHdqJJXrLzwM2n/aoJEmSJGkW6NpDtQ7wl1HKF9IM+ZMkSZKkOadrg+oHwKt6XleSNYG3Av8z3UFJkiRJ0mzQdcjfW4DvJ3kCcC/gEGBrYH3gqTMUmyRJkiQNtU49VFV1FvAo4CfAt4C1aSakeGxVnTdz4UmSJEnS8Or8DKmqugJ41wzGIkmSJEmzStfnUG03RlXRTEpxXlVdM21RSZIkSdIs0LWH6ns0jSeAkXnTe1//NclXgVdW1Y3TF54kSZIkDa+us/z9A/B74BXAg9vlFcBSYJd2eQzwgekPUZIkSZKGU9ceqgOBN1RV7xTp5ydZBhxcVY9PcgdwKPC66Q5SkiRJkoZR1x6qRwCXjlJ+aVsHcCbwt9MRlCRJkiTNBl0bVGcB70hyr5GCdv3tbR3AA4Arpjc8SZIkSRpeXYf8vRY4Bbg0ye9oJqR4FPBX4PntNlsCn5z2CCVJkiRpSHVqUFXVz5M8kGYiiofRzOx3PPDFkVn9qurYGYtSkiRJkobQZB7seyNwxAzGIkmSJEmzSucGVZIHAE8H7kvfvVdV9ZFpjkuSJEmShl6nBlWSlwOfBW4HlnHXQ31p121QSZIkSZpzus7y9x7gEGBBVW1RVQ/sWbbserIk30tyS5IV7XJOT90OSc5OclOS7yZZ1FOXJAcnubpdPpgknd+lJGlOS7J3kiVJbk1ydE/5k5KcnuSaJMuSnJjkfj31ByS5rSdvrUjSOe9JklZ/XRtUmwCfrqo7puGce1fV/HZ5GECSjYGTgHcCGwJLgBN69tkT2AnYBng0zcyCe01DLJKkueEymofUf7av/D7AkcAWwCLgBuBzfduc0JO35lfV+TMdrCRp9uh6D9VpwLbATCWRnYGlVXUiNFcEgauSbFVVZwO7A4dU1SVt/SHAa4DDZygeSdJqpKpOAkiyGLh/T/nXe7dLchjw/VUbnSRpNuvaoDodODjJ1sCZwG29lSOJqqP3J/kAcA7wjqr6HrA1cEbP8W5Mcl5bfnZ/fbu+9STOKUlSF9sBS/vKXpDkGuBy4LCq+tRYOyfZk2ZUBZtvvvmMBSlJGh5dG1Qj06W/fZS6AtbseJy3AmcBfwF2BU5J8hhgPs1kF72WA+u16/Pb171185OkqnonyDCZSZKmJMmjgf2BF/YUf4lmSOCVNCM1vpLkuqo6frRjVNWR7fYsXry4RttGkrR66XQPVVWtMc7StTFFVf28qm6oqlur6hjgx8DzgBXAgr7NF9CMZWeU+gXAiv7GVHuOI6tqcVUtXrhwYdfQJElzWJIHA18H3lBVPxwpr6qzquqyqrqjqn4CfAx40aDilCQNn66TUsyUAkIzvGKbkcIk6wIP4q5hF3erb9f7h2RIkjRp7ayy3wbeW1Wfn2DzkbwlSRLQsUHVTlv+2iRL22nNt2zL903y4o7H2CDJc5KsnWRe+2yr7YBvAicDj0yyS5K1aYZc/LadkALgWOBNSTZLsimwD3D0pN6pJGnOavPO2jRD1NfsyUWbAd8BPlFV95joKMkLk9ynzYNPBF4P/PeqjV6SNMy69lC9AdiPZlx475W5S4G9Ox5jLZopa5cBVwGvA3aqqnOqahmwC3AQcC3NOPVde/Y9AjiFZkKM3wGnctd9XZIkTWQ/4GZgX+AV7fp+wB7AlsC7ep811bPfrsC5NEPQjwUOboesS5IEdJ+U4l+B11TVqUkO7Cn/NR1n22sbTU8Yp/7bwFZj1BXwlnaRJGlSquoA4IAxqt89zn4vnYl4JEmrj649VItoeob63QasM33hSJIkSdLs0bVBdT7wuFHKn0czDbokSZIkzTldh/x9GDgsyb1p7qF6cpJX0gzB++eZCk6SJEmShlmnBlVVfS7JPOB9wL2Bz9NMSPH6qjphBuOTJEmSpKHVtYeKqjoKOCrJxsAaVfXnmQtLkiRJkoZf1+dQrZFkDYCqugpYI8keSZ4yo9FJkiRJ0hDrOinFqTTPjSLJfGAJ8CHg+0l2m6HYJEmSJGmodW1QPZ7mSfIAOwPXA/cFXgO8eQbikiRJkqSh17VBtR5wXbu+I3ByVd1G08h60AzEJUmSJElDr2uD6k/AU5OsCzwHOL0t3xC4aSYCkyRJkqRh13WWv4/QTJW+ArgI+EFbvh1w5gzEJUmSJElDr+tzqI5I8ivgAcDpVfXXtuo84J0zFZwkSZIkDbPJPIdqCc3sfgAkWauqTp2RqCRJkiRpFuj6HKrXJ9ml5/VngJuTnJPkYTMWnSRJkiQNsa6TUrweWAaQZDvgxcDLgP8FDpmRyCRJkiRpyHUd8rcZcGG7/gLgxKr6UpIzgR/ORGCSJEmSNOy69lBdDyxs1/8O+J92/TZg7ekOSpIkSZJmg649VN8CjkryG+DBwNfb8q2BC2YiMEmSJEkadl17qP4v8GNgY+BFVXVNW/444PjJnjTJQ5LckuQLPWU7JDk7yU1JvptkUU9dkhyc5Op2+WCSTPa8kqS5KcneSZYkuTXJ0X115h9J0pR1fQ7V9cDrRil/1xTP+wnglyMvkmwMnATsAZwCvBc4AXhSu8mewE7ANkABpwPnA4dP8fySpLnlMuBA4DnAOiOF5h9J0srq2kN1pyR/m2Tz3mWS++8KXMdd92EB7AwsraoTq+oW4ABgmyRbtfW7A4dU1SVVdSnNzIKvmmzskqS5qapOqqr/Aq7uqzL/SJJWStfnUK2f5JgkNwOX0tw31bt0kmQB8B5gn76qrYEzRl5U1Y3AeW35Perb9a2RJGnlTGv+SbJnO7RwybJly2YgXEnSsOnaQ/VhmuEOOwG30DyD6t+BS4CXTOJ87wU+U1UX95XPB5b3lS0H1hujfjkwf7Rx7CYzSdIkTFv+AaiqI6tqcVUtXrhw4WibSJJWM11n+Xsu8NKq+mGSO4BfVdUJSS4H9gK+PNEBkjwGeDbw2FGqVwAL+soWADeMUb8AWFFV1X+gqjoSOBJg8eLF96iXJKnHtOUfSdLc1LWHagPgonZ9ObBRu/5T4Ckdj/EMYAvgT0muAN4M7JLk18BSmh4wAJKsCzyoLae/vl1fiiRJK8f8I0laKV0bVOcBW7brvwd2bYc77AxcM+Zed3ckTZJ6TLscDpxKM+PSycAjk+ySZG1gf+C3VXV2u++xwJuSbJZkU5p7sI7ueF5J0hyXZF6bX9YE1kyydpJ5mH8kSSupa4PqaODR7foHaIb5/QX4EHBwlwNU1U1VdcXIQjOM4paqWlZVy4BdgIOAa4FtgV17dj+CZjrbM4Hf0TTEjugYuyRJ+wE3A/sCr2jX9zP/SJJWVtfnUH20Z/07SR4OPB74Y1WdOZUTV9UBfa+/DWw1xrYFvKVdJEmalDbnHDBGnflHkjRlXSeluJuquoi77qmSJEmSZpUt9j110CHMSRd+4B8GHcK06/xg3yQ7JflBkqva5YdJ/mkmg5MkSZKkYdaphyrJPsD7aG7OPbotfjJwXJJ3VtWHZyY8Sas7rxCueqvj1UFJkgal65C/NwN7V9VRPWWfTfIL4D00D/6VJEmSpDml65C/+cB3Ryn/blsnSZIkSXNO1wbVfwEvGqV8F+Cr0xaNJEmSJM0iXYf8nQvsm+SZwE/bsie1y0eSvGlkw6r6yPSGKEmSJEnDqWuD6lU0Dzx8aLuMuBZ4dc/rAmxQSZIkSZoTuj7Y94EzHYgkSZIkzTadn0MlSZIkSbo7G1SSJEmSNEU2qCRJkiRpimxQSZIkSdIUjdmgSvLZJOu169sl6TojoCRJkiTNCeP1UL0CWLdd/y6w4cyHI0mSJEmzx3i9ThcCr0vyLSDAk5NcO9qGVfWDGYhNkiRJkobaeA2qfweOAt5G88Dek8fYroA1pzkuSZIkSRp6Yzaoquq/gf9OsgFwDbA18OdVFJckSZIkDb0JZ/mrquuAZwJ/rKqrR1u6nizJF5JcnuT6JH9IskdP3Q5Jzk5yU5LvJlnUU5ckBye5ul0+mCSTfK+SJN1DkhV9yx1JDm3rtkhSffXvHHTMkqTh0Wnmvqr6fpJ7JdkNeATNML+zgOOq6tZJnO/9wL9U1a1JtgK+l+Q3wEXAScAewCnAe4ETgCe1++0J7ARs0577dOB84PBJnFuSpHuoqvkj60nWBa4ETuzbbIOqun2VBiZJmhU6PYcqySOAPwAfAbalaeh8FPhDkod3PVlVLe1pgFW7PAjYGVhaVSdW1S3AAcA2baMLYHfgkKq6pKouBQ4BXtX1vJIkdfQimuHtPxx0IJKk2aHrg30/BvwvsHlVPb2qng5sDpwB/MdkTpjkk0luAs4GLgdOo7k/64yRbarqRuC8tpz++nZ9ayRJml67A8dWVfWVX5TkkiSfS7LxWDsn2TPJkiRLli1bNrORSpKGQtcG1VOBt1fV9SMF7fo7gKdN5oRV9VpgPeDpNMP8bgXmA8v7Nl3ebsco9cuB+aPdR2UykyRNRZLNge2BY3qKrwKeACwCHk+Tl7441jGq6siqWlxVixcuXDiT4UqShkTXBtUtwAajlK/f1k1KVd1RVT8C7g/8G7ACWNC32QLghna9v34BsGKUK4gmM0nSVO0G/KiqLhgpqKoVVbWkqm6vqiuBvYEdk/TnLEnSHNW1QXUKcFSSpyZZs12eBhwBfHUlzj+P5h6qpTQTTgB33hQ8Uk5/fbu+FEmSps9u3L13ajQjF/KcaVaSBHRvUL0B+CPNTbq3tMv3aSaqeGOXAyS5b5Jdk8xvG2TPAV4KfIfmocGPTLJLkrWB/YHfVtXZ7e7HAm9KslmSTYF9gKM7xi5J0riSPAXYjL7Z/ZJsm+RhSdZIshHwceB7VdU/TF2SNEd1nTb9OuCFSR4MPJzmytxZVXXuJM5VNMP7DqdpyF0EvLF9gDBJdgEOA74A/BzYtWffI4AtgTPb159uyyRJmg67AydV1Q195VsC7wPuC1xP89iOl67i2CRJQ6xTg2pE24CaTCOqd99lNDf7jlX/bWCrMeoKeEu7SJI0rapqrzHKjweOX8XhSJJmka5D/iRJkiRJfWxQSZIkSdIU2aCSJEmSpCmasEGVZF6S17az60mSJEmSWhM2qKrqduBDwFozH44kSZIkzR5dh/z9DHjcTAYiSZIkSbNN12nTjwIOSbII+BVwY29lVf16ugOTJEmSpGHXtUF1XPvnR0apK2DN6QlHkiRJkmaPrg2qB85oFJIkSZI0C3VqUFXVRTMdiCRJkiTNNp2fQ5XkuUm+luSsJA9oy/ZIssPMhSdJkiRJw6tTgyrJy4EvAX+kGf43MoX6msBbZiY0SZIkSRpuXXuo3gK8pqr+H3B7T/nPgMdMd1CSJEmSNBt0bVA9BPjpKOUrgAXTF44kSZIkzR5dG1SXAQ8dpXw74LzpC0eSJEmSZo+uDaojgY8neWr7+gFJdgc+CHxqRiKTJEmSpCHXddr0DyZZHzgdWBv4LnAr8OGq+sQMxidJkiRJQ6vrg32pqnckOQh4BE3P1llVtWLGIpMkSZKkIdf5OVStAm4BbgLumMyOSe6V5DNJLkpyQ5LfJHluT/0OSc5OclOS7yZZ1FOXJAcnubpdPpgkk4xdkqRRJflekluSrGiXc3rqxsxPkiR1fQ7VvZL8B3ANcAbwW+CaJB9LsnbHc80DLga2B9YH3gl8KckWSTYGTmrLNgSWACf07LsnsBOwDfBo4PnAXh3PK0lSF3tX1fx2eRhAh/wkSZrjug75+xSwI7AHd02f/mTg/cB6wD9PdICquhE4oKfoa0kuAB4PbAQsraoTAZIcAFyVZKuqOhvYHTikqi5p6w8BXgMc3jF+SZKmYmfGz0+SpDmu65C//wO8uqq+WFXnt8sXgX8BXjSVEyfZhGYq9qXA1jQ9X8Cdja/z2nL669v1rRlFkj2TLEmyZNmyZVMJTZI0N70/yVVJfpzkGW3ZRPnpbsxBkjT3dG1Q3QhcOkr5pcDNkz1pkrWALwLHtFf45gPL+zZbTtP7xSj1y4H5o91HVVVHVtXiqlq8cOHCyYYmSZqb3gpsCWxG86iQU5I8iInz092YgyRp7unaoDoUeFeSdUYK2vV3tnWdJVkD+DzwF2DvtngFsKBv0wXADWPULwBWVFVN5tySJI2mqn5eVTdU1a1VdQzwY+B5TJyfJElz3Jj3UCX5al/RM4BLk/y2ff2odv91u56s7VH6DLAJ8Lyquq2tWkpzn9TIdusCD2rLR+q3AX7Rvt6mp06SpOlWQJg4P0mS5rjxJqW4uu/1V/peXzCF830KeDjw7KrqHSp4MvChJLsApwL7A7/tueH3WOBNSU6jSXL7MMmeMUmSRpNkA2Bb4PvA7cBLgO2AN9LMbjtefpIkzXFjNqiq6tXTeaL2uR17AbcCV/Tc/rRXVX2xTVaHAV8Afg7s2rP7ETRj289sX3+6LZMkaWWtBRwIbEXzjMWzgZ2q6hyACfKTJGmO6zpt+kqrqotohk+MVf9tmmQ2Wl0Bb2kXSZKmTVUtA54wTv2Y+UmSpE4NqiT3oXmG1DOB+9I3mUVV3XfaI5MkSZKkIde1h+pYmmduHANcSXMfkyRJkiTNaV0bVM8Atq+qX89gLJIkSZI0q3R9DtV5k9hWkiRJkuaEro2kNwDvT7JNkjVnMiBJkiRJmi26Dvk7F1gH+DVAz5TnAFSVjSxJkiRJc07XBtXxwPrA63FSCkmSJEkCujeoFgNPrKrfzWQwkiRJkjSbdL2H6ixgwUwGIkmSJEmzTdcG1X7AR5I8O8kmSTbsXWYyQEmSJEkaVl2H/J3W/vkt7n7/VNrXTkohSZIkac7p2qB65oxGIUmSJEmzUKcGVVV9f6YDkSRJkqTZplODKsnjxquvql9PTziSJEmSNHt0HfK3hOZeqd4n+vbeS+U9VJIkSZLmnK4Nqgf2vV4LeCzwDuBt0xqRJEmSJM0SXe+humiU4nOTLAfeBXx9WqOSJEmSpFmg63OoxnIB8JhpiEOSJEmSZp1ODar+B/km2SjJI4H3A+d0PVmSvZMsSXJrkqP76nZIcnaSm5J8N8minrokOTjJ1e3ywSS5xwkkSZqkJPdK8pkkFyW5Iclvkjy3rdsiSSVZ0bO8c9AxS5KGR9d7qK7i7pNQQDNBxcXASyZxvsuAA4HnAOvceaBkY+AkYA/gFOC9wAnAk9pN9gR2ArZp4zgdOB84fBLnliRpNPNo8tn2wJ+A5wFfSvKonm02qKrbBxGcJGm4TfXBvn8FlgHnTibBVNVJAEkWA/fvqdoZWFpVJ7b1BwBXJdmqqs4GdgcOqapL2vpDgNdgg0qStJKq6kbggJ6iryW5AHg88KuBBCVJmjWG5cG+WwNn9JzvxiTnteVn99e361uPdqAke9L0aLH55pvPVLySpNVUkk2AhwJLe4ovSjIyQuLfq+qqMfY1B0nSHDPuPVSj3Ds16jINccwHlveVLQfWG6N+OTB/tPuoqurIqlpcVYsXLlw4DaFJkuaKJGsBXwSOaUdIXAU8AVhE02O1Xls/KnOQJM09E/VQjXbvVL/qcJyJrAAW9JUtAG4Yo34BsKKqJopNkqROkqwBfB74C7A3QFWtoHm4PcCVSfYGLk+yoKquH0ykkqRhMlFDqP/eqV5/D7wBmI6bdJfS3CcFQJJ1gQdx13CLpTQTUvyifb0Ndx+KIUnSlLUjHj4DbAI8r6puG2PTkQt5zjQrSQImaFCNdu9UkscBBwPbAUfQzMjXSZJ57TnXBNZMsjZNg+xk4ENJdgFOBfYHftsOtwA4FnhTktNoktk+wKFdzytJ0gQ+BTwceHZV3TxSmGRb4Drgj8B9gI8D36uq/mHqkqQ5qvODfZM8MMlxwM+Ba4BHVNXrq2rZJM63H3AzsC/winZ9v/YYuwAHAdcC2wK79ux3BM106mcCv6NpdB0xifNKkjSq9rmHe9E8qP6KnudNvRzYEvgGzRD03wG3Ai8dVKySpOEz4b1PSTai6TH6V+DHwJOrasn4e42uqg7g7lPT9tZ9G9hqjLoC3tIukiRNm6q6iPGH8B2/qmKRJM0+E83y93bgPJqHHb6wqp411caUJEmSJK1uJuqhOpBmWN4lwGuTvHa0jarqH6c7MEmSJEkadhM1qI5l4mnTJUmSJGlOmmiWv1etojgkSZIkadbpPMufJEmSJOnubFBJkiRJ0hTZoJIkSZKkKbJBJUmSJElTZINKkiRJkqbIBpUkSZIkTZENKkmSJEmaIhtUkiRJkjRFNqgkSZIkaYpsUEmSJEnSFNmgkiRJkqQpskElSZIkSVNkg0qSJEmSpsgGlSRJkiRN0axpUCXZMMnJSW5MclGSlw06JknS3GAOkiSNZd6gA5iETwB/ATYBHgOcmuSMqlo60KgkSXOBOUiSNKpZ0UOVZF1gF+CdVbWiqn4EfBV45WAjkySt7sxBkqTxzJYeqocCd1TVH3rKzgC2798wyZ7Anu3LFUnOWQXx6S4bA1cNOoipyMGDjkCzjP/WB2PRAM5pDpo9/H+pucJ/66vemPlntjSo5gPL+8qWA+v1b1hVRwJHroqgdE9JllTV4kHHIc00/63PKeagWcL/l5or/Lc+XGbFkD9gBbCgr2wBcMMAYpEkzS3mIEnSmGZLg+oPwLwkD+kp2wbwZmBJ0kwzB0mSxjQrGlRVdSNwEvCeJOsmeSrwQuDzg41Mo3Coi+YK/63PEeagWcX/l5or/Lc+RFJVg46hkyQbAp8F/g64Gti3qo4bbFSSpLnAHCRJGsusaVBJkiRJ0rCZFUP+JEmSJGkY2aCSJEmSpCmyQSVJkiRJU2SDSpI6SHKvJAclOT/J8rZsxyR7Dzo2SdLqzRw03GxQaVokWSvJ05O8pH29bpJ1Bx2XNI0+CjwSeDkwMpvPUuDfBhaRJMAcpDnBHDTEnOVPKy3Jo4CvArcC96+q+UmeB+xeVS8ZbHTS9EhyOfDgqroxyTVVtWFbfl1VbTDY6KS5yxykucAcNNzsodJ0+BSwf1VtBdzWln0feNrgQpKm3V+Aeb0FSRbSPJNI0uCYgzQXmIOGmA0qTYetgS+06wVQVTcC6wwsImn6nQgck+SBAEnuBxwG/OdAo5JkDtJcYA4aYjaoNB0uBB7fW5DkicC5A4lGmhlvp/m3fiawAfBH4DLg3YMLSRLmIM0N5qAh5j1UWmlJng98Bjgc2Ac4CPhX4DVV9a1BxibNhHaYxVXlF6g0cOYgzTXmoOFjg0rTIsnjgD2ARcDFwFFV9avBRiWtnCRbdtmuqs6f6Vgkjc0cpNWROWj2sEElSWNI8leaezIyzmZVVWuuopAkSXOEOWj2sEGlKUnyni7bVdX+Mx2LJGluMQdJGibzJt5EGtUDBh2AJGnOMgdJGhr2UElSB0nmAa8Ftgc2pmcIRlVtN6i4JEmrP3PQcHPadE2bJOsleWCSLUeWQcckTaOPAnsBP6CZovkrwH2B7wwyKEkNc5BWc+agIWYPlVZakkcAXwS24a6bJ0ceruiNklotJLkUeHJV/SnJdVW1QZKtgCOqavtBxyfNVeYgzQXmoOFmD5WmwyeB7wIbAtcD9wGOAHYfZFDSNLs3zXTMADcnuXdVnQ08doAxSTIHaW4wBw0xe6i00pJcC9y3qm7ruWqyLvC7qnrgoOOTpkOSnwBvrKpfJDkF+D3Nj7eXV9XDBxudNHeZgzQXmIOGmz1Umg63AGu161cl2Zzm39ZGgwtJmnZvAG5v198EPA54AbDnwCKSBOYgzQ3moCFmD5VWWpIvAadV1dFJPgD8I02C+1NV7TTQ4CRJqzVzkKRBs0GlaZVkDeBlwHzg2Kq6acAhSdMmyRbAo2n+fd+pqo4bSECS7sYcpNWZOWh42aDSSkuyPvB6mhsj+/+T7ziQoKRpluRtwP7AUuDmnqryGSDS4JiDNBeYg4bbvEEHoNXCicCawMnc/T+5tDrZB3h8VZ016EAk3Y05SHOBOWiI2aDSdHgSsFFV3TboQKQZdDVw4aCDkHQP5iDNBeagIeYsf5oOPwKcslOruzcCRyZZnGTz3mXQgUlznDlIc8EbMQcNLe+h0kpLcl/gNODnwJW9dVX1noEEJU2zJC8EjgI27quqqlpzACFJwhykucEcNNwc8qfpcBDwAJqu6AU95bbWtTr5JPB24D/xPg1pmJiDNBeYg4aYPVRaaUluAB5aVZcPOhZppiS5Eti0qu4YdCyS7mIO0lxgDhpu3kOl6XA+4M3AWt19GNg3SQYdiKS7MQdpLjAHDTF7qLTSkrwZ2Bk4lHuOX//OQIKSplmSi4G/Bf5CM9vSnarKm4KlATEHaS4wBw03G1RaaUkuGKOqqmrLVRqMNEOSbD9WXVV9f1XGIuku5iDNBeag4WaDSpIkSZKmyHuoJKmDJPdKclCS85Msb8t2TLL3oGOTJK3ezEHDzQaVJHXzUeCRwMu5azrmpcC/DSwiSdJcYQ4aYg75k6QOklwOPLiqbkxyTVVt2JZfV1UbDDY6SdLqzBw03OyhkqRu/kLfw9CTLKRvtiVJkmaAOWiI2aCSpG5OBI5J8kCAJPcDDqN5ar0kSTPJHDTEbFBJ0hj6bvY9ArgQOBPYAPgjcBnwnlUemCRptWcOmj28h0qSxpBkeVWt365fX1UL2vWFwFXlF6gkaYaYg2aPeRNvIklz1nlJDqGZSWmtJK8GMlKZNKtV9dnBhCdJWo2Zg2YJe6gkaQxJHgq8BVgEPBP44SibVVU9a5UGJkla7ZmDZg8bVJLUQZL/qaodBh2HJGnuMQcNNxtUkiRJkjRFzvInSZIkSVNkg0qSJEmSpsgGlSRJkiRNkQ0qSZIkSZoiG1SSJEmSNEX/HxijmmEXYCfjAAAAAElFTkSuQmCC",
      "text/plain": [
       "<Figure size 864x288 with 2 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Check the sex balance\n",
    "train_sex_value_counts = df_train.sex.value_counts()\n",
    "test_sex_value_counts = df_test.sex.value_counts()\n",
    "\n",
    "plt.figure(figsize=(12, 4))\n",
    "\n",
    "plt.subplot(121)\n",
    "train_sex_value_counts.plot.bar()\n",
    "train_sex_ratio = train_sex_value_counts['male']/train_sex_value_counts['female']\n",
    "plt.title(f'Train set: male vs female ratio: {train_sex_ratio:.2f}')\n",
    "plt.ylabel('Number of passengers')\n",
    "\n",
    "plt.subplot(122)\n",
    "test_sex_value_counts.plot.bar()\n",
    "test_sex_ratio = test_sex_value_counts['male']/test_sex_value_counts['female']\n",
    "plt.title(f'Test set: male vs female ratio: {test_sex_ratio:.2f}')\n",
    "\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, lets check that the relative number of passenger per class is similar between the train and test sets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:38.404343Z",
     "start_time": "2020-05-01T17:12:38.078737Z"
    }
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAA1QAAAEUCAYAAAAspncYAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAofUlEQVR4nO3debhkVXnv8e9PGgGBBpEWRWmIAhpAQW3nARWNIWow4jUBxSlIRH2co0RDRKIGp2vivThgHNEQh4ByRSNBYxQVFVQ0YMCBQUSgmRq6QUF47x97HyzLOqd3V1fVOXXO9/M89bBr7VV7v2efptZ591p7rVQVkiRJkqQNd7v5DkCSJEmSppUJlSRJkiQNyYRKkiRJkoZkQiVJkiRJQzKhkiRJkqQhmVBJkiRJ0pBMqLToJPlCkmfPdxzaOEkeneSS+Y5DkrQ0JDkqycfmOw5NHxMqLQhJ1va8bk1yY8/7Z2zIsapq/6r6yLhi7ZdklySVZNmkzilJGr1RtkXt8b6S5NAxxPmcJKeP+riShuMfgFoQqmqrme0kFwKHVtVp/fWSLKuq30wyNo1Okk2q6pb5jkOSBunaFmk6+DeDJsUeKi1oM8O+krwmyWXAh5LcMcnnkqxOck27ffeez9x2R3DmLl6St7d1L0iy/xzne02SXyS5Psl5SfZry2+X5IgkP01yVZJPJtmu/dhX2/9e297FfGiHn+uoJJ9O8on2XN9NsnfP/plzXZ/k3CR/1rNv1yT/lWRNkiuTfKItT5J3Jrmi3feDJHu1+zZrr8HFSS5P8t4kW/Rd41e2n/1lkuf2nO9OSf5fkuuSfCfJG3vvjCa5d5L/SHJ1e82e3rPvw0nek+TzSdYBjxlwLbZL8qEkl7a/o8/Mcs1Gek0kqau52oAkmyf5WFt+bfs9uUOSNwGPBP5v2zb83wHHHfjZdt82ST7Qfif/ov3u3STJHwLvBR7aHvfajj/DV5L8Q5Jvt9+Hn+1px0jyqSSXtfu+mmTPnn1/0n7vXt/G8qq2fPs0bfC1bRvwtSS3a/ftmOTf0rTVFyR5Sc/xjmqv4UfbY56TZFXP/vsn+V6771Np2so39ux/UpLvt+f9RpL79uy7ME1b/gNgXQaMHkmyZ0+7dXmS185yzUZ6TbR4+QvWNLgLsB2wM3AYzb/bD7XvVwI3Ar/XUPV4MHAesD3wVuADSdJfKcm9gBcDD6yqrYEnABe2u18CPAXYF9gRuAY4tt33qPa/21bVVlX1zSQr2y/TlXPEdQDwqfZn+xfgM0k2bff9lKYh3gZ4A/CxJHdt9/09cCpwR+DuwP9py/+ojWV3YFvgz4Gr2n1vacv3AXYF7gb8XU8sd2nPdTfgL4Fjk9yx3XcssK6t8+z2NXPNtgT+o43/zsBBwLt7Gx3gYOBNwNbAoCEqxwN3APZsj/HOQRdrDNdEkrqaqw14Ns330k7AnYAXADdW1euArwEvbtuGFw847sDPtvs+AvyG5jv7fjTfZ4dW1Y/aet9sj7stQJKD2yRiLs8Cntf+DL8B3tWz7wvAbjTfw98FPt6z7wPAX7Vt417Al9vyVwKXACuAHYDXAtUmEP8POJumXdkPeFmSJ/Qc80+Bf6X5bj6Zth1PcnvgJODDNO3jCUDvDbT7Ax8E/qq9Zu8DTk6yWc+xDwKeSNMu/04PVZKtgdOAf2+vw67Al2a5XiO7JrMcX4tFVfnytaBeNEnM49rtRwM3AZvPUX8f4Jqe91+haXQAngP8pGffHWi+2O4y4Di7AlcAjwM27dv3I2C/nvd3BW6mGTa7S3vMZRvwMx4FnNHz/nbAL4FHzlL/+8AB7fZHgeOAu/fVeSxwPvAQ4HY95aFJiO7ZU/ZQ4IKea3xjb/ztdXgIsEn7c96rZ98bgdPb7T8HvtYXx/uA17fbHwY+Osd1uCtwK3DHAfseDVwyx2eHvia+fPnytb5XX1s0VxvwPOAbwH0HHOO29miWcwz8LM0f4r8GtugpOwj4z3b7OTPfwxvw83wFOKbn/R407esmA+pu27Zr27TvL6ZJYJb31Tsa+Cywa1/5g4GL+8r+BvhQu30UcFpfLDe2248CfgGkZ//pwBvb7fcAf9937POAfXt+b8+b4zocBHxvln1HAR+bZd9GXRNfi/tlD5Wmweqq+tXMmyR3SPK+JBcluY5myN22STaZ5fOXzWxU1Q3t5lb9larqJ8DLaL5Qr0jyr0l2bHfvDJzU9jpdS9O43kLT6A3r5z3nvpXmjtaOAEme1TOc4Vqau1/bt9VfTZMkfbsdJvG89hhfprnDdyxweZLjkiynuUt2B+CsnuP9e1s+46r63bt4N9BcoxU0fzD8vGdf7/bOwINnjtse+xk0vVmD6vfbCbi6qq6Zow4w8msiSRtirjbgeOCLwL+mGbr81p7RBusz22d3BjYFftlzzvfR9JRsjN7v44vac2yfZijhMWmGNF7Hb0dnzHzHHgj8CXBRmuHVM0Pb3wb8BDg1yc+SHNGW7wzs2Nc2vJbfbTMv69m+Adi8HZ63I/CLqurt1elvd17Zd+yd2s8Nqt9vJ5oRD3MawzXRImZCpWnQ31X+SuBewIOrajm/HXL3e8P4NvhEVf9SVY+g+cIumqFy0Hw5719V2/a8Nq+qXwyIr6udZjba4RF3By5NsjPwfprhh3eqZjjHf9P+fFV1WVU9v6p2pLk79u4ku7b73lVVD6AZPrc78NfAlTQ9UHv2xL5N9Tx8PYfVNMNC7t5TtlPP9s+B/+q7LltV1eE9dea6Pj8Htkuy7VxBjOGaSNKGmLUNqKqbq+oNVbUH8DDgSTRD62A97cMcn/05TQ/V9j3nW15VM8OpN7rdoRkyfzNNG3EwzTD0x9EMQdylrTPzHfudqjqAJqH7DPDJtvz6qnplVd0DeDLwijTPHv+cZhRE7/Xauqr+pEOMvwTu1jc0v7/deVPfse9QVSf01Flfu3PPDnGM+ppoETOh0jTamiZBuDbNA7WvH8VBk9wryWPbcdi/as8xMyPde4E3tX/Yk2RFkgPafatphq3dYwNP+YAkT23vyL2MpvE8A9iSpjFY3Z7ruTS9MTNx/q/8dhKOa9q6tyR5YJIHt3c317U/wy1t79f7gXcmuXN7jLv1jWUfqJoZ+U4Ejmp7Bu/Nb/9QAPgcsHuSQ5Js2r4emOah6fWqql/SjFF/d5rJRjZN8qgBVUd6TbrEJkk9Zm0DkjwmyX3aURLX0SQpM98zlzNH2zDbZ9vvxlOBdyRZnmZSjHsm2bfnuHdP87zRhnhmkj2S3IFmaNqn2+/5rWnaoKtoRjS8uSfG2yd5RpJtqurmNs5b2n1PSjMpUHrKbwG+DVyXZnKILdrenr2SPLBDjN9sj/HiJMva6/ygnv3vB17QfrcnyZZJnpjm2aguPgfcJcnL0kzYtHWSBw+oN+prokXMhErT6B+BLWjuqp1BM3xtFDYDjmmPexnNXaeZmX/+ieah2VOTXN+e98Fw2zDCNwFfb4cfPCTNpBRrM/ekFJ+leQbpGuAQ4Knt3cpzgXfQNCqXA/cBvt7zuQcC30qyto3ppVV1AbCcpqG5hmYox1XA29vPvIZmCMIZ7dCF02h6+bp4Mc3ductohqecQNPIUFXX0zwo/RfApW2dt9Bcy64Oofkj4n9ont16WX+FMV0TSepq1jaAZojzp2n+eP4R8F/Ax3o+97Q0M5i+i98312efBdweOJfmO+zTNM9uQTMBwjnAZUmuBGj/wD9nPT/H8TTPtl4GbE4z2QY0z6FeRPPs0rntz9frEODCtv14AfDMtnw3mvZkLc3387ur6ittkvZkmmecL6BpV/+Zpi2ZU1XdBDyVZoKka9tzfY7ftjtnAs+nGc59DU3b9pz1Hbfn+NcDj2/juwz4MQNmoGXE16RrfJpO+d0hqpImIclRNA+sPnN9dReaJG+hmdTj2eutLElaEJJ8hWbChX+e71g2VJJvAe+tqg/NdyzSIPZQSZpTmnWm7tsOrXgQzV3Dk+Y7LknS4pRk3yR3aYf8PRu4L6MbjSKN3O8tdiZJfbamGea3I82QvHfQDFeUJGkc7kUzycNWNDPyPa19rkxakBzyJ0mSJElDcsifJEmSJA3JhEqSJEmShrSonqHafvvta5dddpnvMCRJY3LWWWddWVUr5juOQWyDJGnxmqv9WVQJ1S677MKZZ54532FIksYkyUXzHcNsbIMkafGaq/1xyJ8kSZIkDcmESpIkSZKGZEIlSZIkSUMyoZIkSZKkIZlQSZIkSdKQTKgkSZIkaUgmVJIkSZI0JBMqSZIkSRqSCZUkSZIkDWnZfAewmOxyxCnzHcK8ufCYJ853CJK0ZNn+SNL8sYdKkiRJkoZkQiVJkiRJQzKhkiQtekm2S3JSknVJLkpy8Cz1npPkliRre16Pnmy0kqRp4jNUkqSl4FjgJmAHYB/glCRnV9U5A+p+s6oeMcngJEnTyx4qSdKilmRL4EDgyKpaW1WnAycDh8xvZJKkxcCESpK02O0O3FJV5/eUnQ3sOUv9+yW5Msn5SY5MMutojiSHJTkzyZmrV68eZcySpClhQiVJWuy2Atb0la0Bth5Q96vAXsCdaXq1DgL+erYDV9VxVbWqqlatWLFiROFKkqaJCZUkabFbCyzvK1sOXN9fsap+VlUXVNWtVfVD4GjgaROIUZI0pUyoJEmL3fnAsiS79ZTtDQyakKJfARlLVJKkRcGESpK0qFXVOuBE4OgkWyZ5OHAAcHx/3ST7J9mh3b43cCTw2UnGK0maLiZUkqSl4IXAFsAVwAnA4VV1TpKV7VpTK9t6+wE/SLIO+DxNIvbmeYlYkjQVXIdKkrToVdXVwFMGlF9MM2nFzPtXAa+aXGSSpGlnD5UkSZIkDcmESpIkSZKGZEIlSZIkSUOaaEKVZLskJyVZl+SiJAfPUu85SW5pHxSeeT16krFKkiRJ0vpMelKKY4GbgB2AfYBTkpxdVYPWAvlmVT1iksFJkiRJ0oaYWA9Vki2BA4Ejq2ptVZ0OnAwcMqkYJEmSJGmUJjnkb3fglqo6v6fsbGDPWerfL8mVSc5PcmSSgb1pSQ5LcmaSM1evXj3qmCVJkiRpVpNMqLYC1vSVrQG2HlD3q8BewJ1perUOAv560EGr6riqWlVVq1asWDHCcCVJkiRpbpNMqNYCy/vKlgPX91esqp9V1QVVdWtV/RA4GnjaBGKUJEmSpM4mmVCdDyxLsltP2d7AoAkp+hWQsUQlSZIkSUOaWEJVVeuAE4Gjk2yZ5OHAAcDx/XWT7J9kh3b73sCRwGcnFaskSZIkdTHphX1fCGwBXAGcABxeVeckWdmuNbWyrbcf8IMk64DP0yRib55wrJIkSZI0p4muQ1VVVwNPGVB+Mc2kFTPvXwW8anKRSZIkSdKGG7qHKsmmowxEkiRJkqZNp4QqyUuSHNjz/gPAjUnOS3KvsUUnSZIkSQtY1x6qlwCrAZI8Cng6cDDwfeAdY4lMkiRJkha4rs9Q3Q24sN1+MvCpqvpkkh8CXxtHYJIkSZK00HXtoboOWNFuPx74Urt9M7D5qIOSJEmSpGnQtYfqVOD9Sb4H7Ap8oS3fE7hgHIFJkiRJ0kLXtYfqRcDpwPbA09rpzwHuT7OelCRJkiQtOevtoUqyDHgmcExVXdq7r6peP67AJEmSJGmhW28PVVX9Bngb4LpTkiRJktSj65C/M4AHjDMQSZIkSZo2XSeleD/w9iQrgbOAdb07q+q7ow5MkiRJkha6rgnVv7T//d8D9hWwyWjCkSRJkqTp0TWh+oOxRiFJkiRJU6hTQlVVF407EEmSJEmaNl0npSDJ/kk+l+TcJDu1ZYcm2W984UmSJEnSwtUpoUryDOCTwI9phv/NTKG+CfDq8YQmSZIkSQtb1x6qVwPPr6qXA7/pKT8D2GfUQUmSJEnSNOiaUO0GfHNA+Vpg+ejCkSRJkqTp0TWhuhTYfUD5o4Cfji4cSZIkSZoeXROq44B3JXl4+36nJM8G3gq8ZyyRSZIkSdIC1ymhqqq3AicC/wFsCfwn8F7gvVV17PjCkyRp4yXZLslJSdYluSjJwR0+8+UklaTrmo2SpCWocyNRVa9L8iZgD5pE7NyqWju2yCRJGp1jgZuAHWgmUzolydlVdc6gyu3stiZSkqT12qDGoqpuAM4cUyySJI1cki2BA4G92huBpyc5GTgEOGJA/W2A1wPPYvCETJIk3aZTQpXkP4EasKuAXwE/AT5SVd8dYWySJI3C7sAtVXV+T9nZwL6z1H8zzfPBl63vwEkOAw4DWLly5UaGKUmaRl0npfgRcH/grsAl7euubdkVwCOAbyXZbxxBSpK0EbYC1vSVrQG27q+YZBXwcOD/dDlwVR1XVauqatWKFSs2OlBJ0vTpOuTvV8CHq+plvYVJ3gFUVT0gyT8BbwS+NNoQJUnaKIPWTFwOXN9bkOR2wLuBl1bVb5JMKDxJ0jTr2kP1bJoHevu9D3huu30czYQVkiQtJOcDy5Ls1lO2N9A/IcVyYBXwiSSXAd9pyy9J8sjxhylJmkZde6gC7An8uK98j3YfwM3ArSOKS5KkkaiqdUlOBI5OcijNLH8HAA/rq7oG2LHn/U7At4EHAKsnEKokaQp1Tag+Anygvbv3HZrJKB4EvAb4cFtnX+C/Rx2gJEkj8ELggzTP/V4FHF5V5yRZCZwL7FFVF9MzEUWSzdvNy6vqN5MOWJI0HbomVK8CLgdeDtylLbsMeBvw9vb9F4EvjDQ6SZJGoKquBp4yoPximkkrBn3mQn47CkOSpIE6PUNVVbdU1TFVtSOwLbBtVe1YVW+pqlvaOhdX1SVzHceV6iVJkiQtJhucpFTVdRtxPleqlyRJkrRodOqhanuW3pPk/CTXJrmu99XxGDMr1R9ZVWur6nRgZqX6QfVnVqp/dbcfRZIkSZImq2vvzweA+9FMjX4pzaQUG2osK9W7Sr0kSZKk+dI1odoPeHxVfWsjzjXMSvUvBe4+10Gr6jiaRI9Vq1YNk+hJkiRJ0lC6Lux7Bc1K8xtjqJXqN/KckiRJkjQ2XROq19EsiDhwatmOXKlekiRJ0qLSdcjf3wK7AFckuQi4uXdnVd13fQdwpXpJkiRJi03XhOrTIzqfK9VLkiRJWjQ6JVRV9YZRnMyV6iVJkiQtJl2foSLJ5kmeluQ1SbZty+6ZZLuxRSdJkiRJC1inHqokuwKn0fQibQt8CrgWOLx9f+hYopMkSZKkBaxrD9U/AqcCOwA39pSfDDxmxDFJkiRJ0lToOinFw4CHVNUtye88znQxvzsjnyRJkiQtGZ2foQI2HVC2kmaac0mSJElacromVKcCr+h5X0mWA28AThl5VJIkSZI0BboO+XsF8J9JzgM2Bz4B7ApcDjx9TLFJkiRJ0oLWdR2qS5PsAxwE3J+mZ+s44ONVdeNcn5UkSZKkxaprDxVt4vTB9iVJkiRJS16nZ6iSPD3JH/W8/7sklyT5YpK7ji88SZIkSVq4uk5KcdTMRpL7A68F3kUz8987Rh+WJEmSJC18XYf87Qyc127/GfCZqnprklOBL44lMkmSJEla4Lr2UP0K2Lrd3g84rd1e01MuSZIkSUtK1x6qrwHvSHI6sAp4Wlu+O/DzcQQmSZIkSQtd1x6qFwM30SRSL6iqS9vy/XHInyRJkqQlqus6VJcATx5Q/rJRByRJkiRJ06LrtOkrkqzoeX+fJG9MctD4QpMkSZKkha3rkL9P0vZQJdke+CrNbH/vTfLKMcUmSZIkSQta14TqvsAZ7fbTgJ9U1Z7As4C/GkdgkiRJkrTQdU2otgDWttuPA05ut78L7DTqoCRJkiRpGnRNqH4MPDXJTsAfAae25TsA144hLkmSJEla8LomVG8A3gJcCJxRVd9qy58AfG8McUmSJEnSgtcpoaqqE4GVNIv6/nHPrtOAV4whLkmSRibJdklOSrIuyUVJDp6l3l8kOS/JmiRXJPlIkuWTjleSND269lBRVZdX1feq6taesm9V1f+MJzRJkkbmWJoF6ncAngG8J8meA+p9HXh4VW0D3INmvcY3TixKSdLU6bSwL0CS3Wlm+FsJ3L53X1U9b8RxSZI0Ekm2BA4E9qqqtcDpSU4GDgGO6K1bVT/v+/gtwK4TCVSSNJU6JVRJngj8G83zUg8AvgPcE9gM+NrYopMkaePtDtxSVef3lJ0N7DuocpJHAKcAy4EbaNZdlCRpoK5D/o4G3lBVDwV+TXNXbxeaZ6i+MpbIJEkaja2ANX1la4CtB1WuqtPbIX93B95GMyHTQEkOS3JmkjNXr149onAlSdOka0J1L+AT7fbNwB2q6lc0idbLxhCXJEmjspamt6nXcuD6uT5UVb8A/h341znqHFdVq6pq1YoVKzY6UEnS9OmaUF0PbN5u/5LfjidfBtxx1EFJkjRC5wPLkuzWU7Y3cE6Hzy6jGeIuSdJAXROqbwGPaLdPAd6R5PXAh4BvjiMwSZJGoarWAScCRyfZMsnDgQOA4/vrJnlGkpVp7Ay8CfjSZCOWJE2TrgnVK4Az2u2jgFNpZkz6CXBo15O5DogkaZ68ENgCuAI4ATi8qs5pk6e1SVa29fYAvkEzTPDrwHnA8+cjYEnSdOg0y19V/axn+wbg8CHP17sOyD7AKUnOrqr+YRcz64BcmWQr4H0064C8ZMjzSpKWsKq6GnjKgPKLaSatmHn/OuB1k4tMkjTtOq9DBZDksTR37wDOraovb8BnXQdEkiRJ0qLSdR2qP6AZf34f4NK2eMckPwQO7O3BmoPrgEiSJElaVLo+Q/UB4DrgHlW1sqpWAvcArgX+ueMxxrIOiGuASJIkSZovXROqhwIvaceaA7eNO395u6+LsawD4hogkiRJkuZL14TqYprZkfptDvQ/7zQb1wGRJEmStKh0TaheCbwryUOSbNK+HgL8Y7tvvVwHRJIkSdJi03WWvxOAzWimM7+1Lbsdzex7H09yW8Wqmmu9qBcCH6RZB+QqetYBAc4F9miHEu4BvAW4I3AN8HngbzrGKkmStOjtcsQp8x3CvLnwmCfOdwjSbbomVC8exclcB0SSJEnSYtJ1Yd+PjDsQSZIkSZo2XZ+hkiRJkiT1MaGSJEmSpCGZUEmSJEnSkGZNqGamLZ9kMJIkSZI0TebqoboAWAGQ5MtJtp1IRJIkSZI0JeZKqK4Htm+3Hw1sOvZoJEmSJGmKzDVt+mnAl5P8qH1/UpKbBlWsqseOPDJJkiRJWuDmSqgOAZ4H7ArsC5wH3DCJoCRJkiRpGsyaUFXVjcCxAEn2AV5ZVddOJixJkiRJWvjm6qG6TVU9ZmY7yVZNUa0bW1SSJEmSNAU6r0OV5EVJLgbWANcluSjJC8cXmiRJkiQtbJ16qJK8Fvgb4O3A6W3xI4FjkiyvqmPGFJ8kSZIkLVidEirgBcBhVXVCT9mXkvwYeDNgQiVJkiRpyek65O/OwHcGlH8b2GF04UiSJEnS9OiaUJ0PHDyg/GCa6dQlSZIkacnpOuTvKOCTSR4FfB0o4BE061P9r/GEJkmSJEkLW9dp009M8mDg5cCTgADnAg+qqu+NMT5pwdvliFPmO4R5ceExT5zvECRJkuZd1x4qquos4JljjEWSJEmSpkrndagkSZIkSb/LhEqSJEmShmRCJUmSJElDMqGSJEmSpCF1npRCkiRJ0vxaqrMLw8KdYbhzQpXkz4H9gDvT17NVVX864rgkSZIkacHrNOQvyduAjwG7ANcCV/W9JElasJJsl+SkJOuSXJTk4FnqPTvJWUmuS3JJkrcmcTSHJGlWXRuJZwEHVdWnxxmMJEljcixwE7ADsA9wSpKzq+qcvnp3AF4GfAtYAZwMvAo4ZmKRSpKmSteE6nbA98cYhyRJY5FkS+BAYK+qWgucnuRk4BDgiN66VfWenre/SPJx4DETC1aSNHW6zvJ3HPDMcQYiSdKY7A7cUlXn95SdDezZ4bOPAvp7sSRJuk3XHqptgYOTPB74AXBz786qesmI45KkBcsZlqbOVsCavrI1wNZzfSjJc4FVwKFz1DkMOAxg5cqVGxelJGkqde2h2oNmyN9NwL2B+/S89up6Mh8KliTNg7XA8r6y5cD1s30gyVNonpvav6qunK1eVR1XVauqatWKFStGEaskacp0SlKqalTjx30oWJI0aecDy5LsVlU/bsv2ZpahfEn+GHg/8MSq+uGEYpQkTakN6vVJsjmwK1DAT6vqVxvwWR8KliRNXFWtS3IicHSSQ2lu6B0APKy/bpLHAh8H/qyqvj3RQCVJU6nrOlSbtmtRXUPzIO8PgWvaoXibdjyXDwVLkubLC4EtgCuAE4DDq+qcJCuTrE0y8wDUkcA2wOfb8rVJvjBPMUuSpkDXHqq3AAcBLwBOb8seCfwDTVL2qg7HGMtDwT4QLElan6q6GnjKgPKLadqnmfeOhpAkbZCuCdXBwPOq6vM9ZT9Nshr4Z7olVBvzUPDjZnsouKqOo5nWnVWrVlWHOCRJkiRpJLrO8rcN8NMB5T+lmVK9i9seCu4p6/JQ8JN9KFiSJEnSQtQ1oTobGLTW1EtpplNfr6paB8w8FLxlkofTPBR8fH/dnoeCD/ShYEmSJEkLVdchf6+meUD38cA3aWb5eyiwI7D/BpzvhcAHaR4Kvoqeh4KBc4E92vHsvQ8Fz3z2a1W1IeeSJEmSpLHqug7VV5PsDryIZmHfAJ8C3l1Vl3Y9mQ8FS5IkSVpMOq9D1SZOrxtjLJIkSZI0VWZNqJLcH/h+Vd3abs+qqr478sgkSZIkaYGbq4fqTOAuNM87nUnz3FQG1Ctgk9GHJkmSJEkL21wJ1R8Aq3u2JUmSJEk9Zk2oquqi3rfAz6vq9xbObWfokyRJkqQlp+s6VBcAK/oLk9yp3SdJkiRJS07XhCo0vVT9tgJ+NbpwJEmSJGl6zDltepJ3tZsF/EOSG3p2bwI8CPj+eEKTJEmSpIVtfetQ3af9b4A/BG7q2XcT8F3g7WOIS5IkSZIWvDkTqqp6DECSDwEvrarrJhKVJEmSJE2Brs9QvRZY3l+Y5O5JdhhtSJIkSZI0HbomVB8F9h9Q/gTg+NGFI0mSJEnTo2tC9UDgqwPKvwasGl04kiRJkjQ9uiZUy4DNBpRvPku5JEmSJC16XROqbwGHDyh/EfCd0YUjSZIkSdNjfdOmz3gd8OUkewNfasseC9wPeNw4ApMkSZKkha5TD1VVnQE8FPgZ8FTgQOAC4KFV9Y3xhSdJkiRJC1fXHiqq6mzgmWOMRZIkSZKmSueEakaSuwC37y2rqotHFpEkSZIkTYlOCVWSbYB3AU+nL5lqbTLKoCRJkiRpGnSd5e/twN7AU4BfAQcDfw1cAvz5WCKTJEmSpAWu65C//YGDquprSW4BzqqqTyT5JfBXwKfHFqEkSZIkLVBde6i2BS5qt9cAd2q3vwk8bMQxSZIkSdJU6JpQ/RS4R7v9I+AvkoRmCvWrxxGYJEmSJC10XROqDwP3bbePoRnmdxPwNuAtow9LkiRJkha+Ts9QVdU7e7a/nOTewCrgx1X1w3EFJ0mSJEkL2XoTqiSbAqcDz6qq8+C2dadce0qSJEnSkrbeIX9VdTPwB0CNPxxJkiRJmh5dn6H6CPD8cQYiSdK4JNkuyUlJ1iW5KMnBs9TbK8kXk1yZxBuJkqT16roO1ZbAM5I8HjgLWNe7s6peMurAJEkaoWNpJlPaAdgHOCXJ2VV1Tl+9m4FPAu8GPjPJACVJ06lrD9UfAt8FrqGZPv0+Pa+9up7MO4SSpElLsiVwIHBkVa2tqtOBk4FD+utW1XlV9QGgP9GSJGmgOXuoktwX+O+qesyIzucdQknSpO0O3FJV5/eUnQ3su7EHTnIYcBjAypUrN/ZwkqQptL4equ8B28+8SXJKkrsOcyLvEEqS5slWwJq+sjXA1ht74Ko6rqpWVdWqFStWbOzhJElTaH0JVfrePwrYYshzzXaHcM8hjydJUhdrgeV9ZcuB6+chFknSItP1GapRGMsdwiSHJTkzyZmrV6/emENJkhan84FlSXbrKdsbR0FIkkZgfQlV8fvrTw07ScRY7hA63EKSNJeqWgecCBydZMskDwcOAI7vr5vG5sDt2/ebJ9lsogFLkqbK+qZND/CxJL9u328OvD/JDb2VqupPO5zrtjuEVfXjtsw7hJKkSXgh8EHgCuAq4PCqOifJSuBcYI+quhjYGbig53M3AhcBu0w2XEnStFhfQvWRvvcfG/ZEVbUuycwdwkNpZvk7AHhYf90kATaj5w5hc4j6dX9dSZLWp6quBp4yoPximiHpM+8v5PefH5YkaVZzJlRV9dwRn887hJIkSZIWjfX1UI2UdwglSZIkLSaTnOVPkiRJkhYVEypJkiRJGpIJlSRJkiQNyYRKkiRJkoZkQiVJkiRJQzKhkiRJkqQhmVBJkiRJ0pBMqCRJkiRpSCZUkiRJkjQkEypJkiRJGpIJlSRJkiQNyYRKkiRJkoZkQiVJkiRJQzKhkiRJkqQhmVBJkiRJ0pBMqCRJkiRpSCZUkiRJkjQkEypJkiRJGpIJlSRJkiQNyYRKkiRJkoZkQiVJkiRJQzKhkiRJkqQhmVBJkiRJ0pBMqCRJkiRpSCZUkiRJkjQkEypJkiRJGpIJlSRJkiQNyYRKkiRJkoZkQiVJkiRJQzKhkiRJkqQhmVBJkiRJ0pAmmlAl2S7JSUnWJbkoycFz1H15ksuSrEnywSSbTTJWSdLiYfsjSRqXSfdQHQvcBOwAPAN4T5I9+ysleQJwBLAfsAtwD+ANkwtTkrTI2P5IksZiYglVki2BA4Ejq2ptVZ0OnAwcMqD6s4EPVNU5VXUN8PfAcyYVqyRp8bD9kSSNU6pqMidK7gd8o6q26Cl7FbBvVT25r+7ZwJur6hPt++2B1cD2VXVVX93DgMPat/cCzhvfT7GgbQ9cOd9BaOL8vS9NS/n3vnNVrdiQD4yr/Wn32wY1lvK/yaXM3/vSs5R/57O2P8smGMRWwJq+sjXA1h3qzmxvDfxOg1ZVxwHHjSjGqZXkzKpaNd9xaLL8vS9N/t432FjaH7ANmuG/yaXJ3/vS4+98sEk+Q7UWWN5Xthy4vkPdme1BdSVJmovtjyRpbCaZUJ0PLEuyW0/Z3sA5A+qe0+7rrXf5oOEWkiSth+2PJGlsJpZQVdU64ETg6CRbJnk4cABw/IDqHwX+MskeSe4I/C3w4UnFOqWW/JCTJcrf+9Lk730D2P5MhP8mlyZ/70uPv/MBJjYpBTTrgAAfBB5PMxb9iKr6lyQrgXOBParq4rbuK4DXAFsA/wa8oKp+PbFgJUmLhu2PJGlcJppQSZIkSdJiMumFfSVJkiRp0TChkiRJkqQhmVBNoST3S/K0JHdIskmSFyd5Z5InzXdskkYrycokf5Zk9wH7DpqPmLS02QZJS4dtUDcmVFMmyV8CnwfeBXyV5sHpPWkWozwhyfPmMTzNk/aPmr+b7zg0Wkn+GPhv4Cjg+0nenWSTnirvm5fAtGTZBqmf7c/iZRvUnZNSTJkk/wP8KRDgR8Ajquob7b4nAG+tqr3nOIQWoSSbATdU1SbrraypkeQs4O+q6pQkOwAfA34NPLWqbkpyfVVtPb9RaimxDVI/25/FyzaoOxOqKZNkTVVt026vA7aq9peY5HbA1VW17TyGqDFJ8sE5di8DnmGDtrj0/v/evl9G06BtT/NH7eU2Zpok26ClyfZnabIN6s4hf9NnXZJN2+0P1+9mxFsAt85DTJqMg4EbgV8MeF0yj3FpfK5JstPMm6r6DXAQcDFwGuAfMJo026ClyfZnabIN6mjZfAegDfYlYFfgR1X1or59TwJ+MPmQNCE/BL5YVSf370iyOXDE5EPSmJ0GPBc4eqag/QP2eUneCzxkvgLTkmUbtDTZ/ixNtkEdOeRvEUmygubf+pXzHYtGL8mLgF9U1WcG7NsE+NuqesPEA9PYJLk9sKyqbphl/8qqunjCYUkD2QYtXrY/S5NtUHcmVJIkSZI0JJ+hkiRJkqQhmVBJkiRJ0pBMqCRJkiRpSCZUkiRJkjQkEypJkiRJGtL/Bz9dTgfmbrzQAAAAAElFTkSuQmCC",
      "text/plain": [
       "<Figure size 864x288 with 2 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Check the class balance\n",
    "train_pclass_value_counts = df_train.pclass.value_counts() / len(df_train)\n",
    "test_pclass_value_counts = df_test.pclass.value_counts() / len(df_test)\n",
    "\n",
    "plt.figure(figsize=(12, 4))\n",
    "\n",
    "plt.subplot(121)\n",
    "plt.title('Train set: passenger class')\n",
    "plt.ylabel('Fraction of passengers')\n",
    "train_pclass_value_counts.plot.bar()\n",
    "\n",
    "plt.subplot(122)\n",
    "plt.title('Test set: passenger class')\n",
    "test_pclass_value_counts.plot.bar()\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From the above diagnostics, we are satisfied that, at least in these few categories, the train and test are similar enough, and we can move forward.\n",
    "\n",
    "## Feature engineering\n",
    "\n",
    "In this section we will use `vaex` to create meaningful features that will be used to train a classification model. To start with, let's get a high level overview of the training data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:38.527108Z",
     "start_time": "2020-05-01T17:12:38.408602Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>pclass</th>\n",
       "      <th>survived</th>\n",
       "      <th>name</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>sibsp</th>\n",
       "      <th>parch</th>\n",
       "      <th>ticket</th>\n",
       "      <th>fare</th>\n",
       "      <th>cabin</th>\n",
       "      <th>embarked</th>\n",
       "      <th>boat</th>\n",
       "      <th>body</th>\n",
       "      <th>home_dest</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>data_type</th>\n",
       "      <td>int64</td>\n",
       "      <td>bool</td>\n",
       "      <td>string</td>\n",
       "      <td>string</td>\n",
       "      <td>float64</td>\n",
       "      <td>int64</td>\n",
       "      <td>int64</td>\n",
       "      <td>string</td>\n",
       "      <td>float64</td>\n",
       "      <td>string</td>\n",
       "      <td>string</td>\n",
       "      <td>string</td>\n",
       "      <td>float64</td>\n",
       "      <td>string</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>1047</td>\n",
       "      <td>1047</td>\n",
       "      <td>1047</td>\n",
       "      <td>1047</td>\n",
       "      <td>841</td>\n",
       "      <td>1047</td>\n",
       "      <td>1047</td>\n",
       "      <td>1047</td>\n",
       "      <td>1046</td>\n",
       "      <td>233</td>\n",
       "      <td>1046</td>\n",
       "      <td>380</td>\n",
       "      <td>102</td>\n",
       "      <td>592</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>NA</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>206</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>814</td>\n",
       "      <td>1</td>\n",
       "      <td>667</td>\n",
       "      <td>945</td>\n",
       "      <td>455</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>2.3075453677172875</td>\n",
       "      <td>0.3744030563514804</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>29.565299286563608</td>\n",
       "      <td>0.5100286532951289</td>\n",
       "      <td>0.3982808022922636</td>\n",
       "      <td>--</td>\n",
       "      <td>32.92609101338429</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>159.6764705882353</td>\n",
       "      <td>--</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>0.833269</td>\n",
       "      <td>0.483968</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>14.161953</td>\n",
       "      <td>1.071309</td>\n",
       "      <td>0.890852</td>\n",
       "      <td>--</td>\n",
       "      <td>50.678261</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>96.220759</td>\n",
       "      <td>--</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>1</td>\n",
       "      <td>False</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>0.1667</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>--</td>\n",
       "      <td>0.0</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>1.0</td>\n",
       "      <td>--</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>3</td>\n",
       "      <td>True</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>80.0</td>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "      <td>--</td>\n",
       "      <td>512.3292</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>327.0</td>\n",
       "      <td>--</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                       pclass            survived    name     sex  \\\n",
       "data_type               int64                bool  string  string   \n",
       "count                    1047                1047    1047    1047   \n",
       "NA                          0                   0       0       0   \n",
       "mean       2.3075453677172875  0.3744030563514804      --      --   \n",
       "std                  0.833269            0.483968      --      --   \n",
       "min                         1               False      --      --   \n",
       "max                         3                True      --      --   \n",
       "\n",
       "                          age               sibsp               parch  ticket  \\\n",
       "data_type             float64               int64               int64  string   \n",
       "count                     841                1047                1047    1047   \n",
       "NA                        206                   0                   0       0   \n",
       "mean       29.565299286563608  0.5100286532951289  0.3982808022922636      --   \n",
       "std                 14.161953            1.071309            0.890852      --   \n",
       "min                    0.1667                   0                   0      --   \n",
       "max                      80.0                   8                   9      --   \n",
       "\n",
       "                        fare   cabin embarked    boat               body  \\\n",
       "data_type            float64  string   string  string            float64   \n",
       "count                   1046     233     1046     380                102   \n",
       "NA                         1     814        1     667                945   \n",
       "mean       32.92609101338429      --       --      --  159.6764705882353   \n",
       "std                50.678261      --       --      --          96.220759   \n",
       "min                      0.0      --       --      --                1.0   \n",
       "max                 512.3292      --       --      --              327.0   \n",
       "\n",
       "          home_dest  \n",
       "data_type    string  \n",
       "count           592  \n",
       "NA              455  \n",
       "mean             --  \n",
       "std              --  \n",
       "min              --  \n",
       "max              --  "
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_train.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Imputing\n",
    "\n",
    "We notice that there are 3 columns that have missing data, so our first task will be to impute the missing values with suitable substitutes. This is our strategy:\n",
    "\n",
    "- age: impute with the median age value\n",
    "- fare: impute with the mean fare of the 5 most common values.\n",
    "- cabin: impute with \"M\" for \"Missing\"\n",
    "- Embarked: Impute with with the most common value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:38.546371Z",
     "start_time": "2020-05-01T17:12:38.529144Z"
    }
   },
   "outputs": [],
   "source": [
    "# Handle missing values\n",
    "\n",
    "# Age - just do the median of the training set for now\n",
    "fill_age = df_train.percentile_approx(expression='age', percentage=50.0)\n",
    "# For some numpy versions the `np.percentile` method is broken and returns nan. \n",
    "# As a failsafe, in those cases fill with the mean.\n",
    "if np.isnan(fill_age):\n",
    "    fill_age = df_train.mean(expression='age')\n",
    "df_train['age'] = df_train.age.fillna(value=fill_age)\n",
    "\n",
    "# Fare: the mean of the 5 most common ticket prices.\n",
    "fill_fares = df_train.fare.value_counts(dropna=True)\n",
    "fill_fare = fill_fares.iloc[:5].index.values.mean()\n",
    "df_train['fare'] = df_train.fare.fillna(value=fill_fare)\n",
    "\n",
    "# Cabing: this is a string column so let's mark it as \"M\" for \"Missing\"\n",
    "df_train['cabin'] = df_train.cabin.fillna(value='M')\n",
    "\n",
    "# Embarked: Similar as for Cabin, let's mark the missing values with \"U\" for unknown\n",
    "fill_embarked = df_train.embarked.value_counts(dropna=True).index[0]\n",
    "df_train['embarked'] = df_train.embarked.fillna(value=fill_embarked)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### String processing\n",
    "\n",
    "Next up, let's engineer some new, more meaningful features out of the \"raw\" data that is present in the dataset. \n",
    "Starting with the name of the passengers, we are going to extract the titles, as well as we are going to count the number of words a name contains. These features can be a loose proxy to the age and status of the passengers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:38.587351Z",
     "start_time": "2020-05-01T17:12:38.548452Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Expression = name_title\n",
       "Length: 1,047 dtype: large_string (column)\n",
       "------------------------------------------\n",
       "   0      Mr\n",
       "   1      Mr\n",
       "   2     Mrs\n",
       "   3    Miss\n",
       "   4      Mr\n",
       "    ...     \n",
       "1042  Master\n",
       "1043     Mrs\n",
       "1044  Master\n",
       "1045      Mr\n",
       "1046      Mr"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "Expression = name_num_words\n",
       "Length: 1,047 dtype: int64 (column)\n",
       "-----------------------------------\n",
       "   0  3\n",
       "   1  4\n",
       "   2  5\n",
       "   3  4\n",
       "   4  4\n",
       "  ...  \n",
       "1042  4\n",
       "1043  6\n",
       "1044  4\n",
       "1045  4\n",
       "1046  3"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Engineer features from the names\n",
    "\n",
    "# Titles\n",
    "df_train['name_title'] = df_train['name'].str.replace('.* ([A-Z][a-z]+)\\..*', \"\\\\1\", regex=True)\n",
    "display(df_train['name_title'])\n",
    "\n",
    "# Number of words in the name\n",
    "df_train['name_num_words'] = df_train['name'].str.count(\"[ ]+\", regex=True) + 1\n",
    "display(df_train['name_num_words'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From the cabin colum, we will engineer 3 features:\n",
    " - \"deck\": extacting the deck on which the cabin is located, which is encoded in each cabin value;\n",
    " - \"multi_cabin: a boolean feature indicating whether a passenger is allocated more than one cabin\n",
    " - \"has_cabin\": since there were plenty of values in the original cabin column that had missing values, we are just going to build a feature which tells us whether a passenger had an assigned cabin or not."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:38.747634Z",
     "start_time": "2020-05-01T17:12:38.594540Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Expression = deck\n",
       "Length: 1,047 dtype: string (column)\n",
       "------------------------------------\n",
       "   0  M\n",
       "   1  B\n",
       "   2  M\n",
       "   3  M\n",
       "   4  M\n",
       "  ...  \n",
       "1042  M\n",
       "1043  M\n",
       "1044  M\n",
       "1045  B\n",
       "1046  M"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "Expression = multi_cabin\n",
       "Length: 1,047 dtype: int64 (column)\n",
       "-----------------------------------\n",
       "   0  0\n",
       "   1  0\n",
       "   2  0\n",
       "   3  0\n",
       "   4  0\n",
       "  ...  \n",
       "1042  0\n",
       "1043  0\n",
       "1044  0\n",
       "1045  1\n",
       "1046  0"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "Expression = has_cabin\n",
       "Length: 1,047 dtype: int64 (column)\n",
       "-----------------------------------\n",
       "   0  1\n",
       "   1  1\n",
       "   2  1\n",
       "   3  1\n",
       "   4  1\n",
       "  ...  \n",
       "1042  1\n",
       "1043  1\n",
       "1044  1\n",
       "1045  1\n",
       "1046  1"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "#  Extract the deck\n",
    "df_train['deck'] = df_train.cabin.str.slice(start=0, stop=1)\n",
    "display(df_train['deck'])\n",
    "\n",
    "# Passengers under which name have several rooms booked, these are all for 1st class passengers\n",
    "df_train['multi_cabin'] = ((df_train.cabin.str.count(pat='[A-Z]', regex=True) > 1) &\\\n",
    "                           ~(df_train.deck == 'F')).astype('int')\n",
    "display(df_train['multi_cabin'])\n",
    "\n",
    "# Out of these, cabin has the most missing values, so let's create a feature tracking if a passenger had a cabin\n",
    "df_train['has_cabin'] = df_train.cabin.notna().astype('int')\n",
    "display(df_train['has_cabin'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### More features\n",
    "\n",
    "There are two features that give an indication whether a passenger is travelling alone, or with a famly. \n",
    "These are the \"sibsp\" and \"parch\" columns that tell us the number of siblinds or spouses and the number of parents or children each passenger has on-board respectively. We are going to use this information to build two columns:\n",
    " - \"family_size\" the size of the family of each passenger;\n",
    " - \"is_alone\" an additional boolean feature which indicates whether a passenger is traveling without their family. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:38.813132Z",
     "start_time": "2020-05-01T17:12:38.750219Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Expression = family_size\n",
       "Length: 1,047 dtype: int64 (column)\n",
       "-----------------------------------\n",
       "   0  1\n",
       "   1  1\n",
       "   2  3\n",
       "   3  4\n",
       "   4  1\n",
       "  ...  \n",
       "1042  8\n",
       "1043  2\n",
       "1044  3\n",
       "1045  2\n",
       "1046  1"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "Expression = is_alone\n",
       "Length: 1,047 dtype: int64 (column)\n",
       "-----------------------------------\n",
       "   0  0\n",
       "   1  0\n",
       "   2  0\n",
       "   3  0\n",
       "   4  0\n",
       "  ...  \n",
       "1042  0\n",
       "1043  0\n",
       "1044  0\n",
       "1045  0\n",
       "1046  0"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Size of family that are on board: passenger + number of siblings, spouses, parents, children. \n",
    "df_train['family_size'] = (df_train.sibsp + df_train.parch + 1)\n",
    "display(df_train['family_size'])\n",
    "\n",
    "# Whether or not a passenger is alone\n",
    "df_train['is_alone'] = (df_train.family_size == 0).astype('int')\n",
    "display(df_train['is_alone'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, let's create two new features:\n",
    " - age $\\times$  class\n",
    " - fare per family member, i.e. fare $/$ family_size"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:38.831478Z",
     "start_time": "2020-05-01T17:12:38.823592Z"
    }
   },
   "outputs": [],
   "source": [
    "# Create new features\n",
    "df_train['age_times_class'] = df_train.age * df_train.pclass\n",
    "\n",
    "# fare per person in the family\n",
    "df_train['fare_per_family_member'] = df_train.fare / df_train.family_size"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Modeling (part 1): gradient boosted trees\n",
    "\n",
    "Since this dataset contains a lot of categorical features, we will start with a tree based model. This we will gear the following feature pre-processing towards the use of tree-based models.\n",
    "\n",
    "### Feature pre-processing for boosted tree models\n",
    "\n",
    "The features \"sex\", \"embarked\", and \"deck\" can be simply label encoded. The feature \"name_tite\" contains certain a larger degree of cardinality, relative to the size of the training set, and in this case we will use the Frequency Encoder."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:38.983682Z",
     "start_time": "2020-05-01T17:12:38.833258Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table>\n",
       "<thead>\n",
       "<tr><th>#                                </th><th>pclass  </th><th>survived  </th><th>name                                          </th><th>sex   </th><th>age  </th><th>sibsp  </th><th>parch  </th><th>ticket   </th><th>fare    </th><th>cabin  </th><th>embarked  </th><th>boat  </th><th>body  </th><th>home_dest                           </th><th>name_title  </th><th>name_num_words  </th><th>deck  </th><th>multi_cabin  </th><th>has_cabin  </th><th>family_size  </th><th>is_alone  </th><th>age_times_class  </th><th>fare_per_family_member  </th><th>label_encoded_sex  </th><th>label_encoded_embarked  </th><th>label_encoded_deck  </th><th>frequency_encoded_name_title  </th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr><td><i style='opacity: 0.6'>0</i>    </td><td>3       </td><td>False     </td><td>Stoytcheff, Mr. Ilia                          </td><td>male  </td><td>19.0 </td><td>0      </td><td>0      </td><td>349205   </td><td>7.8958  </td><td>M      </td><td>S         </td><td>--    </td><td>nan   </td><td>--                                  </td><td>Mr          </td><td>3               </td><td>M     </td><td>0            </td><td>1          </td><td>1            </td><td>0         </td><td>57.0             </td><td>7.8958                  </td><td>1                  </td><td>1                       </td><td>0                   </td><td>0.5787965616045845            </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1</i>    </td><td>1       </td><td>False     </td><td>Payne, Mr. Vivian Ponsonby                    </td><td>male  </td><td>23.0 </td><td>0      </td><td>0      </td><td>12749    </td><td>93.5    </td><td>B24    </td><td>S         </td><td>--    </td><td>nan   </td><td>Montreal, PQ                        </td><td>Mr          </td><td>4               </td><td>B     </td><td>0            </td><td>1          </td><td>1            </td><td>0         </td><td>23.0             </td><td>93.5                    </td><td>1                  </td><td>1                       </td><td>1                   </td><td>0.5787965616045845            </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>2</i>    </td><td>3       </td><td>True      </td><td>Abbott, Mrs. Stanton (Rosa Hunt)              </td><td>female</td><td>35.0 </td><td>1      </td><td>1      </td><td>C.A. 2673</td><td>20.25   </td><td>M      </td><td>S         </td><td>A     </td><td>nan   </td><td>East Providence, RI                 </td><td>Mrs         </td><td>5               </td><td>M     </td><td>0            </td><td>1          </td><td>3            </td><td>0         </td><td>105.0            </td><td>6.75                    </td><td>0                  </td><td>1                       </td><td>0                   </td><td>0.1451766953199618            </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>3</i>    </td><td>2       </td><td>True      </td><td>Hocking, Miss. Ellen &quot;Nellie&quot;                 </td><td>female</td><td>20.0 </td><td>2      </td><td>1      </td><td>29105    </td><td>23.0    </td><td>M      </td><td>S         </td><td>4     </td><td>nan   </td><td>Cornwall / Akron, OH                </td><td>Miss        </td><td>4               </td><td>M     </td><td>0            </td><td>1          </td><td>4            </td><td>0         </td><td>40.0             </td><td>5.75                    </td><td>0                  </td><td>1                       </td><td>0                   </td><td>0.20152817574021012           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>4</i>    </td><td>3       </td><td>False     </td><td>Nilsson, Mr. August Ferdinand                 </td><td>male  </td><td>21.0 </td><td>0      </td><td>0      </td><td>350410   </td><td>7.8542  </td><td>M      </td><td>S         </td><td>--    </td><td>nan   </td><td>--                                  </td><td>Mr          </td><td>4               </td><td>M     </td><td>0            </td><td>1          </td><td>1            </td><td>0         </td><td>63.0             </td><td>7.8542                  </td><td>1                  </td><td>1                       </td><td>0                   </td><td>0.5787965616045845            </td></tr>\n",
       "<tr><td>...                              </td><td>...     </td><td>...       </td><td>...                                           </td><td>...   </td><td>...  </td><td>...    </td><td>...    </td><td>...      </td><td>...     </td><td>...    </td><td>...       </td><td>...   </td><td>...   </td><td>...                                 </td><td>...         </td><td>...             </td><td>...   </td><td>...          </td><td>...        </td><td>...          </td><td>...       </td><td>...              </td><td>...                     </td><td>...                </td><td>...                     </td><td>...                 </td><td>...                           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,042</i></td><td>3       </td><td>False     </td><td>Goodwin, Master. Sidney Leonard               </td><td>male  </td><td>1.0  </td><td>5      </td><td>2      </td><td>CA 2144  </td><td>46.9    </td><td>M      </td><td>S         </td><td>--    </td><td>nan   </td><td>Wiltshire, England Niagara Falls, NY</td><td>Master      </td><td>4               </td><td>M     </td><td>0            </td><td>1          </td><td>8            </td><td>0         </td><td>3.0              </td><td>5.8625                  </td><td>1                  </td><td>1                       </td><td>0                   </td><td>0.045845272206303724          </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,043</i></td><td>3       </td><td>False     </td><td>Ahlin, Mrs. Johan (Johanna Persdotter Larsson)</td><td>female</td><td>40.0 </td><td>1      </td><td>0      </td><td>7546     </td><td>9.475   </td><td>M      </td><td>S         </td><td>--    </td><td>nan   </td><td>Sweden Akeley, MN                   </td><td>Mrs         </td><td>6               </td><td>M     </td><td>0            </td><td>1          </td><td>2            </td><td>0         </td><td>120.0            </td><td>4.7375                  </td><td>0                  </td><td>1                       </td><td>0                   </td><td>0.1451766953199618            </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,044</i></td><td>3       </td><td>True      </td><td>Johnson, Master. Harold Theodor               </td><td>male  </td><td>4.0  </td><td>1      </td><td>1      </td><td>347742   </td><td>11.1333 </td><td>M      </td><td>S         </td><td>15    </td><td>nan   </td><td>--                                  </td><td>Master      </td><td>4               </td><td>M     </td><td>0            </td><td>1          </td><td>3            </td><td>0         </td><td>12.0             </td><td>3.7111                  </td><td>1                  </td><td>1                       </td><td>0                   </td><td>0.045845272206303724          </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,045</i></td><td>1       </td><td>False     </td><td>Baxter, Mr. Quigg Edmond                      </td><td>male  </td><td>24.0 </td><td>0      </td><td>1      </td><td>PC 17558 </td><td>247.5208</td><td>B58 B60</td><td>C         </td><td>--    </td><td>nan   </td><td>Montreal, PQ                        </td><td>Mr          </td><td>4               </td><td>B     </td><td>1            </td><td>1          </td><td>2            </td><td>0         </td><td>24.0             </td><td>123.7604                </td><td>1                  </td><td>0                       </td><td>1                   </td><td>0.5787965616045845            </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,046</i></td><td>3       </td><td>False     </td><td>Coleff, Mr. Satio                             </td><td>male  </td><td>24.0 </td><td>0      </td><td>0      </td><td>349209   </td><td>7.4958  </td><td>M      </td><td>S         </td><td>--    </td><td>nan   </td><td>--                                  </td><td>Mr          </td><td>3               </td><td>M     </td><td>0            </td><td>1          </td><td>1            </td><td>0         </td><td>72.0             </td><td>7.4958                  </td><td>1                  </td><td>1                       </td><td>0                   </td><td>0.5787965616045845            </td></tr>\n",
       "</tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "#      pclass    survived    name                                            sex     age    sibsp    parch    ticket     fare      cabin    embarked    boat    body    home_dest                             name_title    name_num_words    deck    multi_cabin    has_cabin    family_size    is_alone    age_times_class    fare_per_family_member    label_encoded_sex    label_encoded_embarked    label_encoded_deck    frequency_encoded_name_title\n",
       "0      3         False       Stoytcheff, Mr. Ilia                            male    19.0   0        0        349205     7.8958    M        S           --      nan     --                                    Mr            3                 M       0              1            1              0           57.0               7.8958                    1                    1                         0                     0.5787965616045845\n",
       "1      1         False       Payne, Mr. Vivian Ponsonby                      male    23.0   0        0        12749      93.5      B24      S           --      nan     Montreal, PQ                          Mr            4                 B       0              1            1              0           23.0               93.5                      1                    1                         1                     0.5787965616045845\n",
       "2      3         True        Abbott, Mrs. Stanton (Rosa Hunt)                female  35.0   1        1        C.A. 2673  20.25     M        S           A       nan     East Providence, RI                   Mrs           5                 M       0              1            3              0           105.0              6.75                      0                    1                         0                     0.1451766953199618\n",
       "3      2         True        Hocking, Miss. Ellen \"Nellie\"                   female  20.0   2        1        29105      23.0      M        S           4       nan     Cornwall / Akron, OH                  Miss          4                 M       0              1            4              0           40.0               5.75                      0                    1                         0                     0.20152817574021012\n",
       "4      3         False       Nilsson, Mr. August Ferdinand                   male    21.0   0        0        350410     7.8542    M        S           --      nan     --                                    Mr            4                 M       0              1            1              0           63.0               7.8542                    1                    1                         0                     0.5787965616045845\n",
       "...    ...       ...         ...                                             ...     ...    ...      ...      ...        ...       ...      ...         ...     ...     ...                                   ...           ...               ...     ...            ...          ...            ...         ...                ...                       ...                  ...                       ...                   ...\n",
       "1,042  3         False       Goodwin, Master. Sidney Leonard                 male    1.0    5        2        CA 2144    46.9      M        S           --      nan     Wiltshire, England Niagara Falls, NY  Master        4                 M       0              1            8              0           3.0                5.8625                    1                    1                         0                     0.045845272206303724\n",
       "1,043  3         False       Ahlin, Mrs. Johan (Johanna Persdotter Larsson)  female  40.0   1        0        7546       9.475     M        S           --      nan     Sweden Akeley, MN                     Mrs           6                 M       0              1            2              0           120.0              4.7375                    0                    1                         0                     0.1451766953199618\n",
       "1,044  3         True        Johnson, Master. Harold Theodor                 male    4.0    1        1        347742     11.1333   M        S           15      nan     --                                    Master        4                 M       0              1            3              0           12.0               3.7111                    1                    1                         0                     0.045845272206303724\n",
       "1,045  1         False       Baxter, Mr. Quigg Edmond                        male    24.0   0        1        PC 17558   247.5208  B58 B60  C           --      nan     Montreal, PQ                          Mr            4                 B       1              1            2              0           24.0               123.7604                  1                    0                         1                     0.5787965616045845\n",
       "1,046  3         False       Coleff, Mr. Satio                               male    24.0   0        0        349209     7.4958    M        S           --      nan     --                                    Mr            3                 M       0              1            1              0           72.0               7.4958                    1                    1                         0                     0.5787965616045845"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "label_encoder = vaex.ml.LabelEncoder(features=['sex', 'embarked', 'deck'], allow_unseen=True)\n",
    "df_train = label_encoder.fit_transform(df_train)\n",
    "\n",
    "# While doing a transform, previously unseen values will be encoded as \"zero\".\n",
    "frequency_encoder = vaex.ml.FrequencyEncoder(features=['name_title'], unseen='zero')\n",
    "df_train = frequency_encoder.fit_transform(df_train)\n",
    "df_train"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once all the categorical data is encoded, we can select the features we are going to use for training the model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:39.052837Z",
     "start_time": "2020-05-01T17:12:38.986328Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table>\n",
       "<thead>\n",
       "<tr><th>#                            </th><th style=\"text-align: right;\">  label_encoded_sex</th><th style=\"text-align: right;\">  label_encoded_embarked</th><th style=\"text-align: right;\">  label_encoded_deck</th><th style=\"text-align: right;\">  frequency_encoded_name_title</th><th style=\"text-align: right;\">  multi_cabin</th><th style=\"text-align: right;\">  name_num_words</th><th style=\"text-align: right;\">  has_cabin</th><th style=\"text-align: right;\">  is_alone</th><th style=\"text-align: right;\">  family_size</th><th style=\"text-align: right;\">  age_times_class</th><th style=\"text-align: right;\">  fare_per_family_member</th><th style=\"text-align: right;\">  age</th><th style=\"text-align: right;\">   fare</th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr><td><i style='opacity: 0.6'>0</i></td><td style=\"text-align: right;\">                  1</td><td style=\"text-align: right;\">                       1</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.578797</td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">               3</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">               57</td><td style=\"text-align: right;\">                  7.8958</td><td style=\"text-align: right;\">   19</td><td style=\"text-align: right;\"> 7.8958</td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1</i></td><td style=\"text-align: right;\">                  1</td><td style=\"text-align: right;\">                       1</td><td style=\"text-align: right;\">                   1</td><td style=\"text-align: right;\">                      0.578797</td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">               4</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">               23</td><td style=\"text-align: right;\">                 93.5   </td><td style=\"text-align: right;\">   23</td><td style=\"text-align: right;\">93.5   </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>2</i></td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                       1</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.145177</td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">               5</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">            3</td><td style=\"text-align: right;\">              105</td><td style=\"text-align: right;\">                  6.75  </td><td style=\"text-align: right;\">   35</td><td style=\"text-align: right;\">20.25  </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>3</i></td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                       1</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.201528</td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">               4</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">            4</td><td style=\"text-align: right;\">               40</td><td style=\"text-align: right;\">                  5.75  </td><td style=\"text-align: right;\">   20</td><td style=\"text-align: right;\">23     </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>4</i></td><td style=\"text-align: right;\">                  1</td><td style=\"text-align: right;\">                       1</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.578797</td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">               4</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">               63</td><td style=\"text-align: right;\">                  7.8542</td><td style=\"text-align: right;\">   21</td><td style=\"text-align: right;\"> 7.8542</td></tr>\n",
       "</tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "  #    label_encoded_sex    label_encoded_embarked    label_encoded_deck    frequency_encoded_name_title    multi_cabin    name_num_words    has_cabin    is_alone    family_size    age_times_class    fare_per_family_member    age     fare\n",
       "  0                    1                         1                     0                        0.578797              0                 3            1           0              1                 57                    7.8958     19   7.8958\n",
       "  1                    1                         1                     1                        0.578797              0                 4            1           0              1                 23                   93.5        23  93.5\n",
       "  2                    0                         1                     0                        0.145177              0                 5            1           0              3                105                    6.75       35  20.25\n",
       "  3                    0                         1                     0                        0.201528              0                 4            1           0              4                 40                    5.75       20  23\n",
       "  4                    1                         1                     0                        0.578797              0                 4            1           0              1                 63                    7.8542     21   7.8542"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# features to use for the trainin of the boosting model\n",
    "encoded_features = df_train.get_column_names(regex='^freque|^label')\n",
    "features = encoded_features + ['multi_cabin', 'name_num_words', \n",
    "                               'has_cabin', 'is_alone', \n",
    "                               'family_size', 'age_times_class',\n",
    "                               'fare_per_family_member',\n",
    "                               'age', 'fare']\n",
    "\n",
    "# Preview the feature matrix\n",
    "df_train[features].head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Estimator: [xgboost](https://xgboost.readthedocs.io/en/latest/)\n",
    "\n",
    "Now let's feed this data into an a tree based estimator. In this example we will use [xgboost](https://xgboost.readthedocs.io/en/latest/). In principle, any algorithm that follows the [scikit-learn](https://scikit-learn.org/stable/) API convention, i.e. it contains the `.fit`, `.predict` methods is compatable with `vaex`. However, the data will be materialized, i.e. will be read into memory before it is passed on to the estimators. We are hard at work trying to make at least some of the estimators from [scikit-learn](https://scikit-learn.org/stable/) run out-of-core!\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:40.968831Z",
     "start_time": "2020-05-01T17:12:39.055474Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table>\n",
       "<thead>\n",
       "<tr><th>#                                </th><th>pclass  </th><th>survived  </th><th>name                                          </th><th>sex   </th><th>age  </th><th>sibsp  </th><th>parch  </th><th>ticket   </th><th>fare    </th><th>cabin  </th><th>embarked  </th><th>boat  </th><th>body  </th><th>home_dest                           </th><th>name_title  </th><th>name_num_words  </th><th>deck  </th><th>multi_cabin  </th><th>has_cabin  </th><th>family_size  </th><th>is_alone  </th><th>age_times_class  </th><th>fare_per_family_member  </th><th>label_encoded_sex  </th><th>label_encoded_embarked  </th><th>label_encoded_deck  </th><th>frequency_encoded_name_title  </th><th>prediction_xgb  </th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr><td><i style='opacity: 0.6'>0</i>    </td><td>3       </td><td>False     </td><td>Stoytcheff, Mr. Ilia                          </td><td>male  </td><td>19.0 </td><td>0      </td><td>0      </td><td>349205   </td><td>7.8958  </td><td>M      </td><td>S         </td><td>--    </td><td>nan   </td><td>--                                  </td><td>Mr          </td><td>3               </td><td>M     </td><td>0            </td><td>1          </td><td>1            </td><td>0         </td><td>57.0             </td><td>7.8958                  </td><td>1                  </td><td>1                       </td><td>0                   </td><td>0.5787965616045845            </td><td>0               </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1</i>    </td><td>1       </td><td>False     </td><td>Payne, Mr. Vivian Ponsonby                    </td><td>male  </td><td>23.0 </td><td>0      </td><td>0      </td><td>12749    </td><td>93.5    </td><td>B24    </td><td>S         </td><td>--    </td><td>nan   </td><td>Montreal, PQ                        </td><td>Mr          </td><td>4               </td><td>B     </td><td>0            </td><td>1          </td><td>1            </td><td>0         </td><td>23.0             </td><td>93.5                    </td><td>1                  </td><td>1                       </td><td>1                   </td><td>0.5787965616045845            </td><td>0               </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>2</i>    </td><td>3       </td><td>True      </td><td>Abbott, Mrs. Stanton (Rosa Hunt)              </td><td>female</td><td>35.0 </td><td>1      </td><td>1      </td><td>C.A. 2673</td><td>20.25   </td><td>M      </td><td>S         </td><td>A     </td><td>nan   </td><td>East Providence, RI                 </td><td>Mrs         </td><td>5               </td><td>M     </td><td>0            </td><td>1          </td><td>3            </td><td>0         </td><td>105.0            </td><td>6.75                    </td><td>0                  </td><td>1                       </td><td>0                   </td><td>0.1451766953199618            </td><td>1               </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>3</i>    </td><td>2       </td><td>True      </td><td>Hocking, Miss. Ellen &quot;Nellie&quot;                 </td><td>female</td><td>20.0 </td><td>2      </td><td>1      </td><td>29105    </td><td>23.0    </td><td>M      </td><td>S         </td><td>4     </td><td>nan   </td><td>Cornwall / Akron, OH                </td><td>Miss        </td><td>4               </td><td>M     </td><td>0            </td><td>1          </td><td>4            </td><td>0         </td><td>40.0             </td><td>5.75                    </td><td>0                  </td><td>1                       </td><td>0                   </td><td>0.20152817574021012           </td><td>1               </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>4</i>    </td><td>3       </td><td>False     </td><td>Nilsson, Mr. August Ferdinand                 </td><td>male  </td><td>21.0 </td><td>0      </td><td>0      </td><td>350410   </td><td>7.8542  </td><td>M      </td><td>S         </td><td>--    </td><td>nan   </td><td>--                                  </td><td>Mr          </td><td>4               </td><td>M     </td><td>0            </td><td>1          </td><td>1            </td><td>0         </td><td>63.0             </td><td>7.8542                  </td><td>1                  </td><td>1                       </td><td>0                   </td><td>0.5787965616045845            </td><td>0               </td></tr>\n",
       "<tr><td>...                              </td><td>...     </td><td>...       </td><td>...                                           </td><td>...   </td><td>...  </td><td>...    </td><td>...    </td><td>...      </td><td>...     </td><td>...    </td><td>...       </td><td>...   </td><td>...   </td><td>...                                 </td><td>...         </td><td>...             </td><td>...   </td><td>...          </td><td>...        </td><td>...          </td><td>...       </td><td>...              </td><td>...                     </td><td>...                </td><td>...                     </td><td>...                 </td><td>...                           </td><td>...             </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,042</i></td><td>3       </td><td>False     </td><td>Goodwin, Master. Sidney Leonard               </td><td>male  </td><td>1.0  </td><td>5      </td><td>2      </td><td>CA 2144  </td><td>46.9    </td><td>M      </td><td>S         </td><td>--    </td><td>nan   </td><td>Wiltshire, England Niagara Falls, NY</td><td>Master      </td><td>4               </td><td>M     </td><td>0            </td><td>1          </td><td>8            </td><td>0         </td><td>3.0              </td><td>5.8625                  </td><td>1                  </td><td>1                       </td><td>0                   </td><td>0.045845272206303724          </td><td>0               </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,043</i></td><td>3       </td><td>False     </td><td>Ahlin, Mrs. Johan (Johanna Persdotter Larsson)</td><td>female</td><td>40.0 </td><td>1      </td><td>0      </td><td>7546     </td><td>9.475   </td><td>M      </td><td>S         </td><td>--    </td><td>nan   </td><td>Sweden Akeley, MN                   </td><td>Mrs         </td><td>6               </td><td>M     </td><td>0            </td><td>1          </td><td>2            </td><td>0         </td><td>120.0            </td><td>4.7375                  </td><td>0                  </td><td>1                       </td><td>0                   </td><td>0.1451766953199618            </td><td>0               </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,044</i></td><td>3       </td><td>True      </td><td>Johnson, Master. Harold Theodor               </td><td>male  </td><td>4.0  </td><td>1      </td><td>1      </td><td>347742   </td><td>11.1333 </td><td>M      </td><td>S         </td><td>15    </td><td>nan   </td><td>--                                  </td><td>Master      </td><td>4               </td><td>M     </td><td>0            </td><td>1          </td><td>3            </td><td>0         </td><td>12.0             </td><td>3.7111                  </td><td>1                  </td><td>1                       </td><td>0                   </td><td>0.045845272206303724          </td><td>1               </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,045</i></td><td>1       </td><td>False     </td><td>Baxter, Mr. Quigg Edmond                      </td><td>male  </td><td>24.0 </td><td>0      </td><td>1      </td><td>PC 17558 </td><td>247.5208</td><td>B58 B60</td><td>C         </td><td>--    </td><td>nan   </td><td>Montreal, PQ                        </td><td>Mr          </td><td>4               </td><td>B     </td><td>1            </td><td>1          </td><td>2            </td><td>0         </td><td>24.0             </td><td>123.7604                </td><td>1                  </td><td>0                       </td><td>1                   </td><td>0.5787965616045845            </td><td>0               </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,046</i></td><td>3       </td><td>False     </td><td>Coleff, Mr. Satio                             </td><td>male  </td><td>24.0 </td><td>0      </td><td>0      </td><td>349209   </td><td>7.4958  </td><td>M      </td><td>S         </td><td>--    </td><td>nan   </td><td>--                                  </td><td>Mr          </td><td>3               </td><td>M     </td><td>0            </td><td>1          </td><td>1            </td><td>0         </td><td>72.0             </td><td>7.4958                  </td><td>1                  </td><td>1                       </td><td>0                   </td><td>0.5787965616045845            </td><td>0               </td></tr>\n",
       "</tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "#      pclass    survived    name                                            sex     age    sibsp    parch    ticket     fare      cabin    embarked    boat    body    home_dest                             name_title    name_num_words    deck    multi_cabin    has_cabin    family_size    is_alone    age_times_class    fare_per_family_member    label_encoded_sex    label_encoded_embarked    label_encoded_deck    frequency_encoded_name_title    prediction_xgb\n",
       "0      3         False       Stoytcheff, Mr. Ilia                            male    19.0   0        0        349205     7.8958    M        S           --      nan     --                                    Mr            3                 M       0              1            1              0           57.0               7.8958                    1                    1                         0                     0.5787965616045845              0\n",
       "1      1         False       Payne, Mr. Vivian Ponsonby                      male    23.0   0        0        12749      93.5      B24      S           --      nan     Montreal, PQ                          Mr            4                 B       0              1            1              0           23.0               93.5                      1                    1                         1                     0.5787965616045845              0\n",
       "2      3         True        Abbott, Mrs. Stanton (Rosa Hunt)                female  35.0   1        1        C.A. 2673  20.25     M        S           A       nan     East Providence, RI                   Mrs           5                 M       0              1            3              0           105.0              6.75                      0                    1                         0                     0.1451766953199618              1\n",
       "3      2         True        Hocking, Miss. Ellen \"Nellie\"                   female  20.0   2        1        29105      23.0      M        S           4       nan     Cornwall / Akron, OH                  Miss          4                 M       0              1            4              0           40.0               5.75                      0                    1                         0                     0.20152817574021012             1\n",
       "4      3         False       Nilsson, Mr. August Ferdinand                   male    21.0   0        0        350410     7.8542    M        S           --      nan     --                                    Mr            4                 M       0              1            1              0           63.0               7.8542                    1                    1                         0                     0.5787965616045845              0\n",
       "...    ...       ...         ...                                             ...     ...    ...      ...      ...        ...       ...      ...         ...     ...     ...                                   ...           ...               ...     ...            ...          ...            ...         ...                ...                       ...                  ...                       ...                   ...                             ...\n",
       "1,042  3         False       Goodwin, Master. Sidney Leonard                 male    1.0    5        2        CA 2144    46.9      M        S           --      nan     Wiltshire, England Niagara Falls, NY  Master        4                 M       0              1            8              0           3.0                5.8625                    1                    1                         0                     0.045845272206303724            0\n",
       "1,043  3         False       Ahlin, Mrs. Johan (Johanna Persdotter Larsson)  female  40.0   1        0        7546       9.475     M        S           --      nan     Sweden Akeley, MN                     Mrs           6                 M       0              1            2              0           120.0              4.7375                    0                    1                         0                     0.1451766953199618              0\n",
       "1,044  3         True        Johnson, Master. Harold Theodor                 male    4.0    1        1        347742     11.1333   M        S           15      nan     --                                    Master        4                 M       0              1            3              0           12.0               3.7111                    1                    1                         0                     0.045845272206303724            1\n",
       "1,045  1         False       Baxter, Mr. Quigg Edmond                        male    24.0   0        1        PC 17558   247.5208  B58 B60  C           --      nan     Montreal, PQ                          Mr            4                 B       1              1            2              0           24.0               123.7604                  1                    0                         1                     0.5787965616045845              0\n",
       "1,046  3         False       Coleff, Mr. Satio                               male    24.0   0        0        349209     7.4958    M        S           --      nan     --                                    Mr            3                 M       0              1            1              0           72.0               7.4958                    1                    1                         0                     0.5787965616045845              0"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import xgboost\n",
    "import vaex.ml.sklearn\n",
    "\n",
    "# Instantiate the xgboost model normally, using the scikit-learn API\n",
    "xgb_model = xgboost.sklearn.XGBClassifier(max_depth=11,\n",
    "                                          learning_rate=0.1, \n",
    "                                          n_estimators=500, \n",
    "                                          subsample=0.75, \n",
    "                                          colsample_bylevel=1, \n",
    "                                          colsample_bytree=1,\n",
    "                                          scale_pos_weight=1.5,\n",
    "                                          reg_lambda=1.5, \n",
    "                                          reg_alpha=5, \n",
    "                                          n_jobs=8,\n",
    "                                          random_state=42,\n",
    "                                          use_label_encoder=False,\n",
    "                                          verbosity=0)\n",
    "\n",
    "# Make it work with vaex (for the automagic pipeline and lazy predictions)\n",
    "vaex_xgb_model = vaex.ml.sklearn.Predictor(features=features,\n",
    "                                           target='survived',\n",
    "                                           model=xgb_model, \n",
    "                                           prediction_name='prediction_xgb')\n",
    "# Train the model\n",
    "vaex_xgb_model.fit(df_train)\n",
    "# Get the prediction of the model on the training data\n",
    "df_train = vaex_xgb_model.transform(df_train)\n",
    "\n",
    "# Preview the resulting train dataframe that contans the predictions\n",
    "df_train"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that in the above cell block, we call `.transform` on the `vaex_xgb_model` object. This adds the \"prediction_xgb\" column as _virtual column_ in the output dataframe. This can be quite convenient when calculating various metrics and making diagnosic plots. Of course, one can call a `.predict` on the `vaex_xgb_model` object, which returns an in-memory `numpy` array object housing the predictions.\n",
    "\n",
    "### Performance on training set\n",
    "\n",
    "Anyway, let's see what the performance is of the model on the training set. First let's create a convenience function that will help us get multiple metrics at once."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:40.985268Z",
     "start_time": "2020-05-01T17:12:40.975947Z"
    }
   },
   "outputs": [],
   "source": [
    "from sklearn.metrics import accuracy_score, f1_score, roc_auc_score\n",
    "def binary_metrics(y_true, y_pred):\n",
    "    acc = accuracy_score(y_true=y_true, y_pred=y_pred)\n",
    "    f1 = f1_score(y_true=y_true, y_pred=y_pred)\n",
    "    roc = roc_auc_score(y_true=y_true, y_score=y_pred)\n",
    "    print(f'Accuracy: {acc:.3f}')\n",
    "    print(f'f1 score: {f1:.3f}')\n",
    "    print(f'roc-auc: {roc:.3f}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's check the performance of the model on the training set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:41.088203Z",
     "start_time": "2020-05-01T17:12:40.988951Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Metrics for the training set:\n",
      "Accuracy: 0.924\n",
      "f1 score: 0.896\n",
      "roc-auc: 0.914\n"
     ]
    }
   ],
   "source": [
    "print('Metrics for the training set:')\n",
    "binary_metrics(y_true=df_train.survived.values, y_pred=df_train.prediction_xgb.values)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Automatic pipelines\n",
    "\n",
    "Now, let's inspect the performance of the model on the test set. You probably noticed that, unlike when using other libraries, we did not bother to create a pipeline while doing all the cleaning, inputing, feature engineering and categorial encoding. Well, we did not _explicitly_ create a pipeline. In fact `veax` keeps track of all the changes one applies to a DataFrame in something called a state. A state is the place which contains all the informations regarding, for instance, the virtual columns we've created, which includes the newly engineered features, the categorically encoded columns, and even the model prediction! So all we need to do, is to extract the state from the training DataFrame, and apply it to the test DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:41.299459Z",
     "start_time": "2020-05-01T17:12:41.093866Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table>\n",
       "<thead>\n",
       "<tr><th>#                            </th><th style=\"text-align: right;\">  pclass</th><th>survived  </th><th>name                                        </th><th>sex   </th><th style=\"text-align: right;\">   age</th><th style=\"text-align: right;\">  sibsp</th><th style=\"text-align: right;\">  parch</th><th>ticket          </th><th style=\"text-align: right;\">   fare</th><th>cabin  </th><th>embarked  </th><th>boat  </th><th style=\"text-align: right;\">  body</th><th>home_dest               </th><th>name_title  </th><th style=\"text-align: right;\">  name_num_words</th><th>deck  </th><th style=\"text-align: right;\">  multi_cabin</th><th style=\"text-align: right;\">  has_cabin</th><th style=\"text-align: right;\">  family_size</th><th style=\"text-align: right;\">  is_alone</th><th style=\"text-align: right;\">  age_times_class</th><th style=\"text-align: right;\">  fare_per_family_member</th><th style=\"text-align: right;\">  label_encoded_sex</th><th style=\"text-align: right;\">  label_encoded_embarked</th><th style=\"text-align: right;\">  label_encoded_deck</th><th style=\"text-align: right;\">  frequency_encoded_name_title</th><th style=\"text-align: right;\">  prediction_xgb</th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr><td><i style='opacity: 0.6'>0</i></td><td style=\"text-align: right;\">       3</td><td>False     </td><td>O&#x27;Connor, Mr. Patrick                       </td><td>male  </td><td style=\"text-align: right;\">28.032</td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      0</td><td>366713          </td><td style=\"text-align: right;\"> 7.75  </td><td>M      </td><td>Q         </td><td>--    </td><td style=\"text-align: right;\">   nan</td><td>--                      </td><td>Mr          </td><td style=\"text-align: right;\">               3</td><td>M     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">           84.096</td><td style=\"text-align: right;\">                  7.75  </td><td style=\"text-align: right;\">                  1</td><td style=\"text-align: right;\">                       2</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.578797</td><td style=\"text-align: right;\">               0</td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1</i></td><td style=\"text-align: right;\">       3</td><td>False     </td><td>Canavan, Mr. Patrick                        </td><td>male  </td><td style=\"text-align: right;\">21    </td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      0</td><td>364858          </td><td style=\"text-align: right;\"> 7.75  </td><td>M      </td><td>Q         </td><td>--    </td><td style=\"text-align: right;\">   nan</td><td>Ireland Philadelphia, PA</td><td>Mr          </td><td style=\"text-align: right;\">               3</td><td>M     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">           63    </td><td style=\"text-align: right;\">                  7.75  </td><td style=\"text-align: right;\">                  1</td><td style=\"text-align: right;\">                       2</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.578797</td><td style=\"text-align: right;\">               0</td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>2</i></td><td style=\"text-align: right;\">       1</td><td>False     </td><td>Ovies y Rodriguez, Mr. Servando             </td><td>male  </td><td style=\"text-align: right;\">28.5  </td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      0</td><td>PC 17562        </td><td style=\"text-align: right;\">27.7208</td><td>D43    </td><td>C         </td><td>--    </td><td style=\"text-align: right;\">   189</td><td>?Havana, Cuba           </td><td>Mr          </td><td style=\"text-align: right;\">               5</td><td>D     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">           28.5  </td><td style=\"text-align: right;\">                 27.7208</td><td style=\"text-align: right;\">                  1</td><td style=\"text-align: right;\">                       0</td><td style=\"text-align: right;\">                   4</td><td style=\"text-align: right;\">                      0.578797</td><td style=\"text-align: right;\">               1</td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>3</i></td><td style=\"text-align: right;\">       3</td><td>False     </td><td>Windelov, Mr. Einar                         </td><td>male  </td><td style=\"text-align: right;\">21    </td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      0</td><td>SOTON/OQ 3101317</td><td style=\"text-align: right;\"> 7.25  </td><td>M      </td><td>S         </td><td>--    </td><td style=\"text-align: right;\">   nan</td><td>--                      </td><td>Mr          </td><td style=\"text-align: right;\">               3</td><td>M     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">           63    </td><td style=\"text-align: right;\">                  7.25  </td><td style=\"text-align: right;\">                  1</td><td style=\"text-align: right;\">                       1</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.578797</td><td style=\"text-align: right;\">               0</td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>4</i></td><td style=\"text-align: right;\">       2</td><td>True      </td><td>Shelley, Mrs. William (Imanita Parrish Hall)</td><td>female</td><td style=\"text-align: right;\">25    </td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      1</td><td>230433          </td><td style=\"text-align: right;\">26     </td><td>M      </td><td>S         </td><td>12    </td><td style=\"text-align: right;\">   nan</td><td>Deer Lodge, MT          </td><td>Mrs         </td><td style=\"text-align: right;\">               6</td><td>M     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            2</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">           50    </td><td style=\"text-align: right;\">                 13     </td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                       1</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.145177</td><td style=\"text-align: right;\">               1</td></tr>\n",
       "</tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "  #    pclass  survived    name                                          sex        age    sibsp    parch  ticket               fare  cabin    embarked    boat      body  home_dest                 name_title      name_num_words  deck      multi_cabin    has_cabin    family_size    is_alone    age_times_class    fare_per_family_member    label_encoded_sex    label_encoded_embarked    label_encoded_deck    frequency_encoded_name_title    prediction_xgb\n",
       "  0         3  False       O'Connor, Mr. Patrick                         male    28.032        0        0  366713             7.75    M        Q           --         nan  --                        Mr                           3  M                   0            1              1           0             84.096                    7.75                      1                         2                     0                        0.578797                 0\n",
       "  1         3  False       Canavan, Mr. Patrick                          male    21            0        0  364858             7.75    M        Q           --         nan  Ireland Philadelphia, PA  Mr                           3  M                   0            1              1           0             63                        7.75                      1                         2                     0                        0.578797                 0\n",
       "  2         1  False       Ovies y Rodriguez, Mr. Servando               male    28.5          0        0  PC 17562          27.7208  D43      C           --         189  ?Havana, Cuba             Mr                           5  D                   0            1              1           0             28.5                     27.7208                    1                         0                     4                        0.578797                 1\n",
       "  3         3  False       Windelov, Mr. Einar                           male    21            0        0  SOTON/OQ 3101317   7.25    M        S           --         nan  --                        Mr                           3  M                   0            1              1           0             63                        7.25                      1                         1                     0                        0.578797                 0\n",
       "  4         2  True        Shelley, Mrs. William (Imanita Parrish Hall)  female  25            0        1  230433            26       M        S           12         nan  Deer Lodge, MT            Mrs                          6  M                   0            1              2           0             50                       13                         0                         1                     0                        0.145177                 1"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# state transfer to the test set\n",
    "state = df_train.state_get()\n",
    "df_test.state_set(state)\n",
    "\n",
    "# Preview of the \"transformed\" test set\n",
    "df_test.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that once we apply the state from the train to the test set, the test DataFrame contains all the features we created or modified in the training data, and even the predictions of the xgboost model!\n",
    "\n",
    "The state is a simple Python dictionary, which can be easily stored as JSON to disk, which makes it very easy to deploy.\n",
    "\n",
    "### Performance on test set\n",
    "\n",
    "Now it is trivial to check the model performance on the test set:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:41.381884Z",
     "start_time": "2020-05-01T17:12:41.310025Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Metrics for the test set:\n",
      "Accuracy: 0.786\n",
      "f1 score: 0.728\n",
      "roc-auc: 0.773\n"
     ]
    }
   ],
   "source": [
    "print('Metrics for the test set:')\n",
    "binary_metrics(y_true=df_test.survived.values, y_pred=df_test.prediction_xgb.values)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Feature importance\n",
    "Let's now look at the feature importance of the `xgboost` model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:41.911379Z",
     "start_time": "2020-05-01T17:12:41.384369Z"
    },
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAh8AAAIbCAYAAABLzPzHAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAABDY0lEQVR4nO3deZglVX3/8feHQXYcQHCZEWaMgBrEoLagJooGN0QUN1RUBFxAYwB/boQYxQXFKAoGEwRxww3U4AZENAqigNIjKCIaQQaGYWeYjZ3h+/ujqvXS9k539TTzfj3Pfbxddc6p76ka537mVN0mVYUkSVJX1pruAiRJ0prF8CFJkjpl+JAkSZ0yfEiSpE4ZPiRJUqcMH5IkqVOGD0m6n0iyVZKVSWZNdy3SSAwfkrSaSfLKJL9IckuS69v3b0mSkfpV1ZVVtVFVreqqVmkiDB+StBpJ8nbgaOBjwEOBhwAHAH8PrDONpUmTJv6GU0laPSSZDVwN7F1V3xqmzW7Ah4BHAsuAE6rqsHbffOBy4AFVdXeSM4GzgX8EHgecC+xVVTdO7UykkbnyIUmrj6cA6wLfGaHNLcDewCbAbsCbk+wxQvu9gH2BB9OsnLxjMgqV7gvDhyStPjYHbqyquwc2JDknydIktyV5elWdWVUXVdU9VfUb4GvAziOM+fmq+r+qug04GdhhSmcgjYHhQ5JWHzcBmydZe2BDVT21qjZp962VZKckP0lyQ5JlNM+DbD7CmNf2vL8V2GgK6pbGxfAhSauPc4E7gBeN0OarwHeBLatqNnAsMOK3YKTVjeFDklYTVbUUeD/wn0lelmSjJGsl2QHYsG22MbCkqm5PsiPNMx3SjLL26E0kSV2pqn9Pshh4F/AlmgdM/wS8GzgHeAtwZJJjgLNonuPYZHqqlSbGr9pKkqROedtFkiR1yvAhSZI6ZfiQJEmdMnxIkqROGT4kSVKn/KqtNAabb755zZ8/f7rLkKQZZcGCBTdW1RaDtxs+pDGYP38+/f39012GJM0oSa4Yaru3XSRJUqcMH5IkqVOGD0mS1CnDhyRJ6pThQ5IkdcrwIUmSOmX4kCRJnTJ8SJKkThk+JElSpwwfkiSpU4YPSZLUKcOHJEnqlOFDkiR1yvAhSZI6ZfiQJEmdMnxIkqROGT4kSVKnDB+SJKlThg9JktQpw4ckSerU2tNdwP1JkoXAG6rqR6O0K2Cbqrp0AseYcN8uJfkCcFVVvafLvlPlosXLmH/IqdNdhiR1auERu03JuK58SJKkThk+JElSpwwfUyDJjknOTbI0yTVJjkmyzqBmz0/ypyQ3JvlYkrV6+u+X5JIkNyf5QZJ54zz+ukk+nuTKJNclOTbJ+u2+ZyS5Ksnbk1zf1rdvT9/1kxyZ5Ioky5L8rKfvC5Nc3M7rzCSP6en3+CS/SrIiyUnAeoNqekGSC9u+5yR53Fj7DjPHzZN8vx1vSZKzB85hkjlJvpXkhiSXJzmw3b5ZO/fd2583SnJpkr2HOcabkvQn6V9167KxXwBJ0ogMH1NjFfA2YHPgKcAuwFsGtXkx0Ac8AXgRsB9Akj2AQ4GXAFsAZwNfG+fxPwpsC+wAbA3MBd7bs/+hwOx2++uBTyfZtN33ceCJwFOBzYB3Afck2bat4+C2rtOA7yVZpw1W3wZObPt8A3jpwMGSPAH4HLA/8CDgM8B325A0Yt8RvB24qq3lITTnrNoA8j3g1+38dgEOTvLcqlpCc56PT/Jg4JPAhVX1paEOUFXHVVVfVfXN2mD2GEqSJI2F4WMKVNWCqjqvqu6uqoU0H7Y7D2r20apaUlVXAkcBr2q37w98pKouqaq7gQ8DO4x19SNJgDcCb2vHX9GO8cqeZncBH6iqu6rqNGAl8Kj2g3s/4KCqWlxVq6rqnKq6A3gFcGpV/bCq7qIJKevThJQnAw8AjmrH/CZwfs/x3gh8pqp+0Y75ReCOtt9ofYdzF/AwYF7b7+yqKuBJwBZV9YGqurOq/gQcPzD/qjqDJuD8L7Bbe74lSR0yfEyBJNu2twSuTbKc5sN/80HNFvW8vwKY076fBxzd3k5YCiwBQvOv+LHYAtgAWNAzxv+02wfc1AabAbcCG7U1rgdcNsS4c9o6Aaiqe9o5zG33LW4//HvnNGAe8PaBetqatmz7jdZ3OB8DLgXOaG9fHdJzrDmDjnUozerIgOOAxwKfr6qbxnAsSdIk8qu2U+O/gAuAV1XViiQHAy8b1GZL4OL2/VbA1e37RcDhVfWVCR77RuA2YLuqWjyBvrcDj6S5bdHramD7gR/aFZYtgcVAAXOTpCdEbMVfQszAnA4ffMAkO4/Sd0jtis7baULNdsBPkpzfHuvyqtpmqH5JZtGsRH0JeHOSz4/la8vbz51N/xR95UyS1jSufEyNjYHlwMokjwbePESbdybZNMmWwEHASe32Y4F/aT9QSTI7ycvHeuB2ReJ44JPtcw0kmZvkuWPs+zngE+1Dm7OSPCXJusDJwG5JdknyAJoP/juAc4BzgbuBA5OsneQlwI49Qx8PHJBkpzQ2TLJbko3H0HdI7QOsW7chaDnNczargF8Cy5O8u314dlaSxyZ5Utv10PZ/96O5dfSlNpBIkjpi+Jga7wD2AlbQfPCeNESb7wALgAuBU4ETAKrqFJoHRr/e3rL5LbDrOI//bppbEue1Y/wIeNQ4ar+I5rmLJW0ta1XVH4DXAP9Bs0KyO7B7+1zFnTQPyO4D3EzzfMh/DwxYVf00z30c0+6/tG3LaH1HsE07r5U0AeY/q+rMqlrV1rYDcHlb62eB2UmeCPw/YO+23UdpVm0O+evhJUlTJfe+1S5pKH19fdXf3z/dZUjSjJJkQVX1Dd7uyockSeqU4WOGan/Z18ohXq+e7tomS5JDh5nj6dNdmyRp4vy2ywxVVdtNdw1Trao+TPM1ZUnS/YgrH5IkqVOGD0mS1CnDhyRJ6pThQ5IkdcrwIUmSOmX4kCRJnTJ8SJKkThk+JElSpwwfkiSpU4YPSZLUKcOHJEnqlOFDkiR1yvAhSZI6ZfiQJEmdMnxIkqROGT4kSVKnDB+SJKlThg9JktQpw4ckSeqU4UOSJHXK8CFJkjpl+JAkSZ1ae7QGSR4FfB3YGvjXqvrUlFelv5LkC8BVVfWeLvuuzpI8DfhsVT1qmP3zgcuBB1TV3fflWBctXsb8Q069L0NI0qRaeMRu013ChI1l5eNdwJlVtbHBQ9MpSSXZeuDnqjq7N3gkWZjkWdNTnSRprMYSPuYBFw+1I8msyS1HkiTd340YPpL8GHgmcEySlUm+muS/kpyW5BbgmUnmJPlWkhuSXJ7kwJ7+6yf5QpKbk/wuyTuTXNWz/17/km3bfqjn5xckuTDJ0iTnJHlcz76FSd6R5DdJliU5Kcl6Pftf1PZdnuSyJM9L8vIkCwbN8e1Jvj3KeVg3yceTXJnkuiTHJlm/3feMJFe141yf5Jok+w46B0cmuaKt82c9fV+Y5OJ2fmcmeUxPv8cn+VWSFUlOAtYbVNNI52bEvsPMcbR57JbkgvZ8LkpyWM+++e213Lfdd3OSA5I8qb0+S5McM+h4+yW5pG37gyTzRqnvp+3bX7d/Fl8xUHO7/0RgK+B77f53DTHG7CQntHNbnORDBmhJ6t6I4aOq/hE4G3hrVW0E3AnsBRwObAycA3wP+DUwF9gFODjJc9sh3gc8sn09F3jdWAtL8gTgc8D+wIOAzwDfTbJuT7M9gecBjwAeB+zT9t0R+BLwTmAT4OnAQuC7wCN6P+SB1wAnjlLOR4FtgR1onn2ZC7y3Z/9Dgdnt9tcDn06yabvv48ATgacCm9HcxronybbA14CDgS2A02g+ONdJsg7w7bauzYBvAC8dy7kZre8oRprHLcDeNOdzN+DNSfYY1H8nYBvgFcBRwL8CzwK2A/ZMsnNb/x7AocBL2rmf3Z6LYVXV09u3f1dVG1XVSYP2vxa4Eti93f/vQwzzReBummv4eOA5wBuGO2aSNyXpT9K/6tZlI5UnSRqHiXzb5TtV9fOqugfYHtiiqj5QVXdW1Z+A44FXtm33BA6vqiVVtQgYzzMjbwQ+U1W/qKpVVfVF4A7gyT1tPlVVV1fVEpoQtEO7/fXA56rqh1V1T1UtrqrfV9UdwEk0gYMk2wHzge8PV0SStLW8rZ3HCuDDPXMEuAv4QFXdVVWnASuBRyVZC9gPOKitYVVVndPW8Qrg1LbGu2hCyvo0IeXJwAOAo9oxvwmcP8ZzM1rfkQw5D4CqOrOqLmrP529owsLOg/p/sKpur6ozaMLK16rq+qpaTBMwHt+22x/4SFVd0j4I+mFgh9FWP+6LJA8BdgUOrqpbqup64JPc+zreS1UdV1V9VdU3a4PZU1WaJK1xRv22yxAW9byfB8xJsrRn2yyaDxqAOYPaXzGO48wDXpfkn3u2rdOOOeDanve39uzbkmYlYShfBL6W5D3Aa4GT2zAwnC2ADYAFTQ4BIDTzHHDToG9T3ApsBGxOc8vjsiHGnUPP+aiqe5Isoll1WAUsrqrqad977kY6NzVK35EMNw+S7AQcATy2Pda6NKsqva7reX/bED9v1FP/0UmO7NkfmrmP58/IeMyjCWXX9FzHtbj3n09JUgcmEj56P9QWAZdX1TbDtL2GJggMPLC61aD9t9J8sA94KDDwTMgimlWTwydQ4yKaWz1/parOS3In8DSaW0h7jTLWjTQfnNu1/4IfjxuB29tafj1o39U0K0fAn1dYtgQW05zjuUnSEyK24i8hZthz097aGKnvRH0VOAbYtapuT3IUTbiaiIH6v3IfaxqsRti3iGZ1aPP7+rVbSdJ9M5Hw0euXwPIk76a5pXIn8Bhg/ao6HzgZ+JckvwA2BP55UP8Lgb2SXAw8m2YZv7/ddzxwSpIftcfZAHgG8NP21sdITgDOSPJ94CfAw4CNq+r37f4v0XyQ3l1VPxtpoHZF4njgk0neWlXXJ5kLPLaqfjCGvp8DPpHktTQrATsCv6I5N4ck2QX4KXAQzYfjOW33u4EDk3waeGHb7yejnRvg3FH6TtTGwJI2eOxIE9rOmOBYxwIfTHJhVV2cZDbwnKoavJIy2HXA3wCXjrL/r1TVNUnOAI5M8m80t5QeATy8qs4areDt586mfwZ/p16SVif36TecVtUqYHeaZy0up/mX/mdpHloEeD/NMvrlNB9Ugx/sPKjtvxR4Nc2DkgNj99M823AMcDPNB84+Y6zrl8C+NPf0lwFn0Sy7DziR5vbBaA+aDnh3e/zzkiwHfkT7LMQYvAO4iOa5iyU0D6+uVVV/oHn25D9oztvuNA9L3llVd9I8jLkPzdxfAfx3z/yGPTej9b0P3gJ8IMkKmodtT57oQFV1Cs15+Hp7Pn9L8zzGaA4Dvth+e2bPIfZ/BHhPu/8dQ+zfm+aW0e9ozs03aYKpJKlDufejAVN8sOQZwJer6uGdHXToOtYHrgeeUFV/nM5aNDP09fVVf3//6A0lSX+WZEFV9Q3evqb+t13eDJxv8JAkqXtrXPhIspDmds/bB22/uP3lVINfr56WQqdAkkOHmePp010bNP+tlmHqWzndtUmSJk+nt12kmcrbLpI0ft52kSRJqwXDhyRJ6pThQ5IkdcrwIUmSOmX4kCRJnTJ8SJKkThk+JElSpwwfkiSpU4YPSZLUKcOHJEnqlOFDkiR1yvAhSZI6ZfiQJEmdMnxIkqROGT4kSVKnDB+SJKlThg9JktQpw4ckSeqU4UOSJHXK8CFJkjpl+JAkSZ0yfEiSpE4ZPkSSRyW5IMmKJAdO4rivTnJGz8+VZOvJGr9n3K2SrEwya7LHliRNvlTVdNegaZbkBGB5Vb1tio9TwDZVdelUHmcqrPuwbephrztqusuY0RYesdt0lyCpY0kWVFXf4O2ufAhgHnDxdBchSVozGD7WcEl+DDwTOKa9dXFQewtmeZJFSQ7raTu/vXWyb7vv5iQHJHlSkt8kWZrkmJ72+yT52RDHfFKS65Ks3bPtpUkuHKXWHZP0t7Vdl+QTg+paO8lT2nkMvG5PsrBtt1aSQ5JcluSmJCcn2ew+nkJJ0jgZPtZwVfWPwNnAW6tqI+DXwN7AJsBuwJuT7DGo207ANsArgKOAfwWeBWwH7Jlk51GOeT5wE/Dsns2vAU4cpdyjgaOr6oHAI4GThxj73KraqJ3LpsB5wNfa3QcCewA7A3OAm4FPD3ewJG9qw07/qluXjVKaJGmsDB+6l6o6s6ouqqp7quo3NB/cg8PEB6vq9qo6A7gF+FpVXV9Vi2mCzOPHcKgv0gQO2tWH5wJfHaXPXcDWSTavqpVVdd4o7T/V1vev7c/7A/9aVVdV1R3AYcDLeldgelXVcVXVV1V9szaYPYYpSZLGwvChe0myU5KfJLkhyTLgAGDzQc2u63l/2xA/bzSGQ30Z2D3JRsCewNlVdc0ofV4PbAv8Psn5SV4wwjz2B54B7FVV97Sb5wGntLeHlgKXAKuAh4yhXknSJDF8aLCvAt8Ftqyq2cCxQCb7IO0qybnAi4HXMvotF6rqj1X1KuDBwEeBbybZcHC7JE8DPgi8qKp675csAnatqk16Xuu1tUiSOjLkcrPWaBsDS6rq9iQ7AnsBZ4zSZ6K+BBxCuyIxWuMkrwF+UFU3tCsX0Kxc9LbZEjgJ2Luq/m/QEMcChyd5XVVdkWQL4KlV9Z3Rjr393Nn0+1VRSZoUrnxosLcAH0iyAngvQzzUOYlOoQ0eVXXLGNo/D7g4yUqah09fWVW3D2qzC/BQmlWRgW+8DHyN+GiaVZ0z2vmdR/PwrCSpQ/6SMU2rJJcB+1fVj6a7lpH09fVVf3//dJchSTOKv2RMq50kLwUK+PF01yJJ6o7hQ9MiyZnAfwH/1PNtFJKcPuiXhA28Dp22YiVJk8oHTjUtquoZw2zfteNSJEkdc+VDkiR1yvAhSZI6ZfiQJEmdMnxIkqROGT4kSVKnDB+SJKlThg9JktQpw4ckSeqU4UOSJHXK8CFJkjpl+JAkSZ0yfEiSpE4ZPiRJUqcMH5IkqVOGD0mS1CnDhyRJ6pThQ5IkdcrwIUmSOmX4kCRJnTJ8SJKkThk+JElSpwwfkiSpU4YPSZLUqbWnu4CZJMlC4A1V9aNR2hWwTVVdOoFjTLhvl5J8Abiqqt7TVd/JODdJzgS+XFWfHU+/ixYvY/4hp070sPcLC4/YbbpLkHQ/4cqHJEnqlOFDkiR1yvAxAUl2THJukqVJrklyTJJ1BjV7fpI/JbkxyceSrNXTf78klyS5OckPkswb5/HXTfLxJFcmuS7JsUnWb/c9I8lVSd6e5Pq2vn17+q6f5MgkVyRZluRnPX1fmOTidl5nJnlMT7/HJ/lVkhVJTgLWG1TTC5Jc2PY9J8njxtp3hHm+s63/6iT7jfUctPtf1NazPMllSZ43xPgPS/KbJO8YSz2SpMlh+JiYVcDbgM2BpwC7AG8Z1ObFQB/wBOBFwH4ASfYADgVeAmwBnA18bZzH/yiwLbADsDUwF3hvz/6HArPb7a8HPp1k03bfx4EnAk8FNgPeBdyTZNu2joPbuk4DvpdknTZYfRs4se3zDeClAwdL8gTgc8D+wIOAzwDfbQPCiH2H04aFdwDPBrYBnjXWc5BkR+BLwDuBTYCnAwsHjT8fOAs4pqo+PkwNb0rSn6R/1a3LRitZkjRGho8JqKoFVXVeVd1dVQtpPmx3HtTso1W1pKquBI4CXtVu3x/4SFVdUlV3Ax8Gdhjr6keSAG8E3taOv6Id45U9ze4CPlBVd1XVacBK4FHt6st+wEFVtbiqVlXVOVV1B/AK4NSq+mFV3UUTUtanCSlPBh4AHNWO+U3g/J7jvRH4TFX9oh3zi8Adbb/R+g5nT+DzVfXbqroFOGwc5+D1wOfaudzTzvX3PWP/LXAm8L6qOm64AqrquKrqq6q+WRvMHkPJkqSx8NsuE9CuEnyCZmVjA5rzuGBQs0U9768A5rTv5wFHJzmyd0iaf7lfMYbDb9Eec0HzGfzn/rN62tzUBpsBtwIb0azUrAdcNsS4c3qPX1X3JFnU1rUKWFxVNWhOA+YBr0vyzz3b1mnHrFH6DmcO9z6nvX1GOwdb0qzcDOfVwKXAN8dQhyRpkhk+Jua/gAuAV1XViiQHAy8b1GZL4OL2/VbA1e37RcDhVfWVCR77RuA2YLuqWjyBvrcDjwR+PWjf1cD2Az+0qwtbAotpAsTcJOkJEVvxlxAzMKfDBx8wyc6j9B3ONe3xB2w1aB4jnYNF7RyHcxjwPOCrSV5ZVatGqYXt586m36+aStKk8LbLxGwMLAdWJnk08OYh2rwzyaZJtgQOAk5qtx8L/EuS7QCSzE7y8rEeuKruAY4HPpnkwe0Yc5M8d4x9Pwd8IsmcJLOSPCXJusDJwG5JdknyAODtNLdOzgHOBe4GDkyydpKXADv2DH08cECSndLYMMluSTYeQ9/hnAzsk+Rvk2wAvG8c5+AEYN92Lmu1+x7dM/ZdwMuBDYET0/MwsCRp6vmX7sS8A9gLWEHzIXjSEG2+Q3Pb4ELgVJoPRKrqFJqHJb+eZDnwW2DXcR7/3TS3Dc5rx/gR8Khx1H4RzXMXS9pa1qqqPwCvAf6DZmVhd2D3qrqzqu6keUB2H+BmmudD/ntgwKrqp3kG45h2/6VtW0brO5yqOp3mWZkft+P9eKznoKp+CewLfBJYRvNg6b2eqemp68HA5wwgktSd3PtWvKSh9PX1VX9//3SXIUkzSpIFVdU3eLv/2pMkSZ0yfKym2l/2tXKI16unu7bJkuTQYeZ4+nTXJkmaOn7bZTVVVdtNdw1Trao+TPP7OSRJaxBXPiRJUqcMH5IkqVOGD0mS1CnDhyRJ6pThQ5IkdcrwIUmSOmX4kCRJnTJ8SJKkThk+JElSpwwfkiSpU4YPSZLUKcOHJEnqlOFDkiR1yvAhSZI6ZfiQJEmdMnxIkqROGT4kSVKnDB+SJKlThg9JktQpw4ckSeqU4UOSJHXK8CFJkjpl+JAkSZ1ae7oL0MyW5DBg66p6zTD7Xw28rqqeM0XHL2Cbqrp0Ko990eJlzD/k1Ps6zGph4RG7TXcJktZwrnxo0iSZn6SS/DnUVtVXpip4jGY6jy1JGp7hQ5IkdcrwsYZKsjDJO5P8JsktSU5I8pAkpydZkeRHSTZN8owkVw3R91lDDPvT9n+XJlmZ5ClJ9knyszHUs12SHyZZkuS6JIe223dMcm6SpUmuSXJMknUGdX9+kj8luTHJx5Ks1fa917HbVZkDkvwxyc1JPp0kI9T0piT9SfpX3bpstClIksbI8LFmeynwbGBbYHfgdOBQYHOaPxsHjnO8p7f/u0lVbVRV546lU5KNgR8B/wPMAbYG/rfdvQp4W1vTU4BdgLcMGuLFQB/wBOBFwH4jHO4FwJOAvwP2BJ47XMOqOq6q+qqqb9YGs8cyFUnSGBg+1mz/UVXXVdVi4GzgF1V1QVXdAZwCPL6jOl4AXFtVR1bV7VW1oqp+AVBVC6rqvKq6u6oWAp8Bdh7U/6NVtaSqrgSOAl41wrGOqKqlbdufADtM9mQkSSPz2y5rtut63t82xM8bdVTHlsBlQ+1Isi3wCZqVjQ1o/swuGNRsUc/7K2hWT4Zzbc/7W+lujpKkluFDo7mF5kMfgCSzgC2GaVsTPMYihl+t+C/gAuBVVbUiycHAywa12RK4uH2/FXD1BOsY1vZzZ9PvV1QlaVJ420Wj+T9gvSS7JXkA8B5g3WHa3gDcA/zNOI/xfeChSQ5Osm6SjZPs1O7bGFgOrEzyaODNQ/R/Z/tw7JbAQcBJ4zy+JKlDhg+NqKqW0Tzg+VlgMc1KyFXDtL0VOBz4efvtlCeP8RgraB583Z3mtsgfgWe2u98B7AWsAI5n6GDxHZpbMRcCpwInjOW4kqTpkaqJrpRLa46+vr7q7++f7jIkaUZJsqCq+gZvd+VDkiR1ygdO1YkkT6P5PSJ/par8xokkrUEMH+pEVZ2NX2uVJOFtF0mS1DHDhyRJ6pThQ5IkdcrwIUmSOmX4kCRJnTJ8SJKkThk+JElSpwwfkiSpU4YPSZLUKcOHJEnqlOFDkiR1yvAhSZI6ZfiQJEmdMnxIkqROGT4kSVKnDB+SJKlThg9JktQpw4ckSeqU4UOSJHXK8CFJkjpl+JAkSZ0yfAiAJMcm+bfprmMkSZ6R5KrprkOSdN+sPd0FqHtJ9gHeUFX/MLCtqg6YvopWfxctXsb8Q06d7jLGZeERu013CZI0JFc+JElSpwwfq4kkhyS5LMmKJL9L8uJ2+6wkRya5McnlSd6apJKs3e6fneSEJNckWZzkQ0lmjXCcxwDHAk9JsjLJ0nb7F5J8qH3/jCRXJXlXkuvbsfdI8vwk/5dkSZJDe8Zcq6f+m5KcnGSzdt96Sb7cbl+a5PwkDxnlXGyW5PNJrk5yc5Jvj+ectfu2TnJWkmXtuTup3Z4kn2zntSzJb5I8dizXSJI0Obztsvq4DHgacC3wcuDLSbYGXgTsCuwA3AJ8Y1C/LwLXAVsDGwLfBxYBnxnqIFV1SZIDGHTbZQgPBdYD5gL7AMcDPwSeCGwFLEjy9ar6E3AgsAewM3AD8Cng08CrgNcBs4EtgTvaedw2yrk4EVgJbNf+71OHaTfkOauqa4APAmcAzwTWAfraPs8Bng5sCywDHg0sHaUeSdIkcuVjNVFV36iqq6vqnqo6CfgjsCOwJ3B0VV1VVTcDRwz0aVcQdgUOrqpbqup64JPAKyehpLuAw6vqLuDrwOZtHSuq6mLgYuBxbdv9gX9ta7wDOAx4Wbs6cxfwIGDrqlpVVQuqavlwB03ysHZOB1TVzVV1V1WdNVTbEc7ZQP3zgDlVdXtV/axn+8Y0oSNVdUkbVoaq5U1J+pP0r7p12RhOmSRpLAwfq4kkeye5sL01sRR4LM0H/hyalYwBve/nAQ8Arunp9xngwZNQ0k1Vtap9P7BScV3P/tuAjXrqOKWnhkuAVcBDaFYxfgB8vb2N8u9JHjDCcbcElrRBa0QjnDOAdwEBfpnk4iT7AVTVj4FjaFZmrktyXJIHDjV+VR1XVX1V1Tdrg9mjlSNJGiPDx2ogyTya2xpvBR5UVZsAv6X58LwGeHhP8y173i+iuZWxeVVt0r4eWFXbjXLImrTi/1LHrj01bFJV61XV4nbl4v1V9bc0t09eAOw9ylibJdlkpAOOcs6oqmur6o1VNYdmZeY/29tYVNWnquqJNLd1tgXeOfGpS5LGy2c+Vg8b0gSCGwCS7Evzr3iAk4GDkpxK88zHuwc6VdU1Sc4Ajmx/R8dK4BHAw4e7VdG6Dnh4knWq6s5JqP9Y4PAkr6uqK5JsATy1qr6T5JnAjcDvgOU0tz1WDTdQO6fTacLCP7VzekpV/XRQ05HOGUleDpxbVVcBN7dtVyV5Ek3o/hXN+bx9pHoGbD93Nv1+dVWSJoUrH6uBqvodcCRwLk0w2B74ebv7eJoHJ38DXACcBtzNXz4w96Z5oPJ3NB+y3wQeNsohf0zzzMa1SW6chCkcDXwXOCPJCuA8YKd230PbmpbT3I45C/jyKOO9liak/B64Hjh4cINRzhnAk4BfJFnZ1nZQVV0OPJDmnN4MXAHcBHx8XLOVJN0nqZrsFXhNpSS7AsdW1bzprmVN0tfXV/39/dNdhiTNKEkWVFXf4O2ufKzmkqzf/n6NtZPMBd4HnDLddUmSNFGGj9VfgPfT3Ca4gObWxXtH7dT8t1pWDvE6dorrHZNhaluZ5GnTXZskaWr5wOlqrqpupXl+Ybz9DgBW2/9eS1VtNHorSdL9kSsfkiSpU4YPSZLUKcOHJEnqlOFDkiR1yvAhSZI6ZfiQJEmdMnxIkqROGT4kSVKnDB+SJKlThg9JktQpw4ckSeqU4UOSJHXK8CFJkjpl+JAkSZ0yfEiSpE4ZPiRJUqcMH5IkqVOGD0mS1CnDhyRJ6pThQ5IkdcrwIUmSOmX4kCRJnTJ8SJKkTs3Y8JHkUUkuSLIiyYHTXc9kSvL3Sf6YZGWSPSZ57EOTfLZ9Pz9JJVl7Mo/RlSRnJnnDdNchSRqfGfmh03oXcGZVPX66C5kCHwCOqaqjJ3vgqvrwZI+5Jrho8TLmH3LqdJcxZguP2G26S5CkYc3YlQ9gHnDxeDutDv/KH0MNE5qbJiaNmfz/BUmaUWbkX7hJfgw8EzimvTVxUHsLZnmSRUkO62k7cGvh9UmuBH7cbt8vySVJbk7ygyTzxnDcSnJgkj8luTHJx3o/tEYas+37T0n+CPxxhGNcBvwN8L12busm2bcdd0V77P172j8jyVVJ3pXk+iTXJNkjyfOT/F+SJUkO7Wl/WJIvD3HclydZMGjb25N8e5Rz8oUk/5nk9Lbenyd5aJKj2vPw+ySP72k/J8m3ktyQ5PLeW2Ztbd9I8uV2rhcl2TbJv7RzW5TkOYNKeGSSXyZZluQ7STbrGe/JSc5JsjTJr5M8o2ffmUkOT/Jz4Nb2nEuSOjAjw0dV/SNwNvDWqtoI+DWwN7AJsBvw5iGeldgZeAzw3HbfocBLgC3asb42xsO/GOgDngC8CNgPYIxj7gHsBPztCHN7JHAlsHtVbVRVdwDXAy8AHgjsC3wyyRN6uj0UWA+YC7wXOB54DfBE4GnAe5OM9uH6XeARSR7Ts+01wImj9APYE3gPsDlwB3Au8Kv2528CnwBog9r3aK7XXGAX4OAkz+0Za/f2mJsCFwA/oPlzOpfmdtRnBh17b5prMAe4G/hUe6y5wKnAh4DNgHcA30qyRU/f1wJvAjYGrhg8qSRvStKfpH/VrcvGcBokSWMxI8PHYFV1ZlVdVFX3VNVvaD70dx7U7LCquqWqbgP2Bz5SVZdU1d3Ah4EdxrL6AXy0qpZU1ZXAUcCr2u1jGfMjbd/bxjm/U6vqsmqcBZxBEyoG3AUcXlV3AV+n+dA/uqpWVNXFNLdwHjfKMe4ATqIJHCTZDpgPfH8MJZ5SVQuq6nbgFOD2qvpSVa1qxxxY+XgSsEVVfaCq7qyqP9EEpVf2jHV2Vf2gPYffoAlyR/TMbX6STXran1hVv62qW4B/A/ZMMqudx2lVdVr75+KHQD/w/J6+X6iqi6vq7nb8wefkuKrqq6q+WRvMHsNpkCSNxf0ifCTZKclP2qX8ZcABNB/AvRb1vJ8HHN0uxy8FlgCh+df1aHrHuYLmX9xjHbO375gl2TXJee0tlKU0H6C987up/aAHGAg21/Xsvw3YaAyH+iKwV5LQrAqc3IaS0Qw+1nDHngfMGThH7VwOBR4ywlg3DjG33rkMvh4PoDk384CXDzrWPwAPG6avJKkj0/7w5ST5KnAMsGtV3Z7kKP46fFTP+0U0KwVfmcCxtuQvD4NuBVw9jjFrhH1DSrIu8C2a2wvfqaq72ucwMt6xRlNV5yW5k2ZVZa/2NZkWAZdX1TaTOOaWPe+3olkFurE91olV9cYR+o77ekiS7rv7S/jYGFjSBo8daT40zxih/bHAB5NcWFUXJ5kNPKeqvjGGY70zyS9o/vV9EO3zDPdxzJGsA6wL3ADcnWRX4DnAb+/juMP5Ek2Qu7uqfjbJY/8SWJ7k3TTPZtxJ8xzO+lV1/gTHfE2SLwELaZ4J+WZVrWofqj2/fZ7kRzQrIk8GLq2qq8Z7kO3nzqbfr69K0qS4X9x2Ad4CfCDJCpoHLk8eqXFVnQJ8FPh6kuU0H+S7jvFY3wEWABfSPNB4wiSMOVKtK4ADaeZ0M02w+u59HXcEJwKPZWwPmo5Le/tkd2AH4HKaFYrPAvflgYoTgS8A19I8dHtge6xFNA8EH0oT3BYB7+T+82dekmasVLnyPFZJCtimqi6d7lqmSpL1ab5d84SqGvYrwWuavr6+6u/vn+4yJGlGSbKgqvoGb/dfgRrszcD5Bg9J0lS5vzzzMSmSPA04fah97e8TmVHHGa8kC2keZN1j0PaLab49Mtj+E3xoV5K0BjN89KiqsxnhK6lVNSnfMBntONOlquYPs327jkuRJN2PedtFkiR1yvAhSZI6ZfiQJEmdMnxIkqROGT4kSVKnDB+SJKlThg9JktQpw4ckSeqU4UOSJHXK8CFJkjpl+JAkSZ0yfEiSpE4ZPiRJUqcMH5IkqVOGD0mS1CnDhyRJ6pThQ5IkdcrwIUmSOmX4kCRJnTJ8SJKkThk+JElSpwwfkiSpU4YPSZLUqbWnuwBpqiU5DNi6ql4z0TEuWryM+YecOnlFTZKFR+w23SVI0ri58iFJkjpl+ND9ShJX8yRpNWf4mEJJFiZ5R5LfJFmW5KQk6yXZNMn3k9yQ5Ob2/cN7+p2Z5ENJzkmyMsn3kjwoyVeSLE9yfpL5Pe0fneSHSZYk+UOSPcdQ2xeSfDrJqUlWJPlFkke2++Ynqd4P8ramN7Tv90ny8ySfTLI0yZ+SPLXdvijJ9UleN8rxH9H2Xav9+bNJru/Z/+UkB7fv5yT5bju/S5O8safdYUm+2bZfDuzTjn1WO68fApv3tF+vbXtTe/zzkzxktPMlSZo8ho+ptyfwPOARwOOAfWjO++eBecBWwG3AMYP6vRJ4LTAXeCRwbttnM+AS4H0ASTYEfgh8FXgw8CrgP5NsN4baXgW8H9gUuBQ4fBzz2gn4DfCg9thfB54EbA28BjgmyUbDda6qy4HlwOPbTU8DViZ5TPvz04Gz2vdfA64C5gAvAz6cZJee4V4EfBPYBPhKW88CmtDxQaA3CL0OmA1s2dZ+AM35/ytJ3pSkP0n/qluXjXAqJEnjYfiYep+qqquragnwPWCHqrqpqr5VVbdW1QqaD/2dB/X7fFVdVlXLgNOBy6rqR1V1N/AN/vKh/QJgYVV9vqrurqpfAd+i+ZAezX9X1S/bMb8C7DCOeV3eHnMVcBLNh/kHquqOqjoDuJMmiIzkLGDnJA9tf/5m+/MjgAcCv06yJfAPwLur6vaquhD4LE0wG3BuVX27qu4BtqAJQf/W1vJTmvM+4C6a0LF1Va2qqgVVtXyo4qrquKrqq6q+WRvMHut5kSSNwvvjU+/anve3AnOSbAB8kmZFZNN238ZJZrUf5gDX9fS7bYifB1YV5gE7JVnas39t4MQJ1DbsSsUQBtdDVQ1X43DOAl5Is6rxU+BMmlBxO3B2Vd2TZA6wpA1pA64A+np+XtTzfg5wc1XdMqj9lu37E9v3X0+yCfBl4F+r6q5RapUkTRLDx/R4O/AoYKequjbJDsAFQCYw1iLgrKp69iTWN/DBvQHNrRGAhw7T9r44C/gYTfg4C/gZcCxN+Bi45XI1sFmSjXsCyFbA4p5xquf9NcCmSTbsCSBbDbRpQ8b7gfe3z82cBvwBOGGkQrefO5t+v9YqSZPC2y7TY2OalYGlSTajfX5jgr4PbJvktUke0L6e1PPsxLhV1Q00H+6vSTIryX40z51Mqqr6I815eA3w0/b2x3XAS2nDR1UtAs4BPtI+LPo44PU0t4mGGvMKoJ8mXKyT5B+A3Qf2J3lmku2TzKIJVncBq4YaS5I0NQwf0+MoYH3gRuA84H8mOlC7GvAcmgdUr6a5lfJRYN37WOMbgXcCNwHb0QSAqXAWcFNVXdnzc2hWgga8CphPM79TgPdV1Q9HGHMvmgdil9AEuy/17HsozbMly2ke3D2L5taLJKkjqarRW0lruL6+vurv75/uMiRpRkmyoKr6Bm935UOSJHXK8HE/luTi9peUDX69ek2qQZK0evHbLvdjVTWWXzR2v69BkrR6ceVDkiR1yvAhSZI6ZfiQJEmdMnxIkqROGT4kSVKnDB+SJKlThg9JktQpw4ckSeqU4UOSJHXK8CFJkjpl+JAkSZ0yfEiSpE4ZPiRJUqcMH5IkqVOGD0mS1CnDhyRJ6pThQ5IkdcrwIUmSOmX4kCRJnTJ8SJKkThk+JElSpwwfkiSpU2tPdwHSTHDR4mXMP+TU6S7jzxYesdt0lyBJEzZjVj6SLEzyrDG0qyRbT/AYE+7bpSRfSPKhrvtO8Hjz2/M6aUF3sq5TkjOTvGEyapIkjd2MCR+SJOn+wfChGWMyV08kSdNnxoWPJDsmOTfJ0iTXJDkmyTqDmj0/yZ+S3JjkY0nW6um/X5JLktyc5AdJ5o3z+Osm+XiSK5Ncl+TYJOu3+56R5Kokb09yfVvfvj19109yZJIrkixL8rOevi9McnE7rzOTPKan3+OT/CrJiiQnAesNqukFSS5s+56T5HFj7TvCPEcac2GSdyb5TZJbkpyQ5CFJTm+P86Mkmw4acr8kV7fn5O09Y414PdtbLP+U5I/AH4eo8x+SLEryzPbnYa9vkmcn+X177o8BMso5eFOS/iT9q25dNpbTJkkagxkXPoBVwNuAzYGnALsAbxnU5sVAH/AE4EXAfgBJ9gAOBV4CbAGcDXxtnMf/KLAtsAOwNTAXeG/P/ocCs9vtrwc+3fNB/HHgicBTgc2AdwH3JNm2rePgtq7TgO8lWaf9IP42cGLb5xvASwcOluQJwOeA/YEHAZ8BvtuGpBH7DmekMXuavRR4dnsudgdOpzm3m9P8uTpw0LDPBLYBngMc0vP8zliu5x7ATsDfDqrzuTTn7aVV9ZORrm+SzYFvAe9pj3UZ8PcjnYeqOq6q+qqqb9YGs0dqKkkahxkXPqpqQVWdV1V3V9VCmg/GnQc1+2hVLamqK4GjgFe12/cHPlJVl1TV3cCHgR3GuvqRJMAbgbe1469ox3hlT7O7gA9U1V1VdRqwEnhUu/qyH3BQVS2uqlVVdU5V3QG8Aji1qn5YVXfRhJT1aULKk4EHAEe1Y34TOL/neG8EPlNVv2jH/CJwR9tvtL7DGWnMAf9RVddV1WKaD/lfVNUF7XxOAR4/aMz3V9UtVXUR8HnaazLG6/mR9nzf1rPt5cBxwPOr6pfttpGu7/OB31XVN9tzfBRw7RjOhSRpks24e+jtKsEnaFY2NqCZw4JBzRb1vL8CmNO+nwccneTI3iFpVimuGMPht2iPuaDJIX/uP6unzU3tB9+AW4GNaP61vR7Nv7gHm9N7/Kq6J8mitq5VwOKqqkFzGjAPeF2Sf+7Ztk47Zo3SdzgjjTngup73tw3x80aDxhx8TbaHCV3PAQcDX2rDTG/dw13fOb3jVFW151iS1LEZFz6A/wIuAF5VVSuSHAy8bFCbLYGL2/dbAVe37xcBh1fVVyZ47BtpPli3a//FP96+twOPBH49aN/VtB/G8OcVli2BxTQBYm6S9ISIrfhLiBmY0+GDD5hk51H6DmfYMe+DLYHf99QwcE3Gcj2Lv/Zy4IQki6vqqEF1/9X1TbJNW8PAz+n9eTTbz51Nv79bQ5ImxYy77QJsDCwHViZ5NPDmIdq8M8mmSbYEDgJOarcfC/xLku0AksxO8vKxHriq7gGOBz6Z5MHtGHPbZw/G0vdzwCeSzEkyK8lT2ucoTgZ2S7JLkgcAb6e5zXEOcC5wN3BgkrWTvATYsWfo44EDkuyUxoZJdkuy8Rj6DmekMSfq35Js0J77ffnLNRnL9RzK1TTPhxyYZOAZkZGu76nAdklekuZbMwfSPJ8jSerYTAwf7wD2AlbQfEieNESb79As3V9I86FzAkBVnULzwOjXkywHfgvsOs7jvxu4FDivHeNHwKPGUftFNM9dLGlrWauq/gC8BvgPmhWS3YHdq+rOqrqT5gHKfYCbaZ4P+e+BAauqn+YZjWPa/Ze2bRmt73BGGvM+OKsd53+Bj1fVGe32sVzP4eq8kiaAvDvJG0a6vlV1I81qyRHATTQPv/78Ps5JkjQBuffjAJKG0tfXV/39/dNdhiTNKEkWVFXf4O0zceVDkiTNYIaPIaT5ZV8rh3i9erprmyxJDh1mjqdPd22SpPu3mfhtlylXVdtNdw1Trao+TPN7MCRJ6pQrH5IkqVOGD0mS1CnDhyRJ6pThQ5IkdcrwIUmSOmX4kCRJnTJ8SJKkThk+JElSpwwfkiSpU4YPSZLUKcOHJEnqlOFDkiR1yvAhSZI6ZfiQJEmdMnxIkqROGT4kSVKnDB+SJKlThg9JktQpw4ckSeqU4UOSJHXK8CFJkjpl+NCMkuRRSS5IsiLJgdNdjyRp/Nae7gKkcXoXcGZVPb7Lg160eBnzDzm1y0P+2cIjdpuW40rSVHHlQzPNPODi8XZKYtCWpNWE4UMzRpIfA88EjkmyMslB7S2Y5UkWJTmsp+38JJXk9UmuBH7cbt8vySVJbk7ygyTzpmc2krTmMnxoxqiqfwTOBt5aVRsBvwb2BjYBdgPenGSPQd12Bh4DPLfddyjwEmCLdqyvdVG7JOkvDB+asarqzKq6qKruqarf0ASJnQc1O6yqbqmq24D9gY9U1SVVdTfwYWCH4VY/krwpSX+S/lW3LpvSuUjSmsTwoRkryU5JfpLkhiTLgAOAzQc1W9Tzfh5wdJKlSZYCS4AAc4cav6qOq6q+quqbtcHsKZiBJK2ZDB+ayb4KfBfYsqpmA8fShIle1fN+EbB/VW3S81q/qs7pqF5JEn7VVjPbxsCSqro9yY7AXsAZI7Q/Fvhgkgur6uIks4HnVNU3RjvQ9nNn0+9XXiVpUrjyoZnsLcAHkqwA3gucPFLjqjoF+Cjw9STLgd8Cu055lZKke0lVjd5KWsP19fVVf3//dJchSTNKkgVV1Td4uysfkiSpU4YPSZLUKcOHJEnqlOFDkiR1yvAhSZI6ZfiQJEmdMnxIkqROGT4kSVKnDB+SJKlThg9JktQpw4ckSeqU4UOSJHXK8CFJkjpl+JAkSZ0yfEiSpE4ZPiRJUqcMH5IkqVOGD0mS1CnDhyRJ6pThQ5IkdcrwIUmSOmX4kCRJnTJ8SJKkThk+JElSpwwfkiSpU4YPSZLUqbWnuwBpJrho8TLmH3JqZ8dbeMRunR1LkrrmyockSeqU4UOSJHXK8KEZKckhSS5LsiLJ75K8uN0+K8mRSW5McnmStyapJGu3+2cnOSHJNUkWJ/lQklnTOxtJWrP4zIdmqsuApwHXAi8Hvpxka+BFwK7ADsAtwDcG9fsicB2wNbAh8H1gEfCZwQdI8ibgTQCzHrjFVMxBktZIrnxoRqqqb1TV1VV1T1WdBPwR2BHYEzi6qq6qqpuBIwb6JHkITTA5uKpuqarrgU8CrxzmGMdVVV9V9c3aYPaUz0mS1hSufGhGSrI38P+A+e2mjYDNgTk0KxkDet/PAx4AXJNkYNtag9pIkqaY4UMzTpJ5wPHALsC5VbUqyYVAgGuAh/c037Ln/SLgDmDzqrq7o3IlSYMYPjQTbQgUcANAkn2Bx7b7TgYOSnIqzTMf7x7oVFXXJDkDODLJvwErgUcAD6+qs0Y64PZzZ9Pv796QpEnhMx+acarqd8CRwLk0D49uD/y83X08cAbwG+AC4DTgbmBVu39vYB3gd8DNwDeBh3VVuyQJUlXTXYM0ZZLsChxbVfPuyzh9fX3V398/SVVJ0pohyYKq6hu83ZUP3a8kWT/J85OsnWQu8D7glOmuS5L0F4YP3d8EeD/NLZULgEuA905rRZKke/GBU92vVNWtwJOmuw5J0vBc+ZAkSZ0yfEiSpE4ZPiRJUqcMH5IkqVOGD0mS1CnDhyRJ6pThQ5IkdcrwIUmSOmX4kCRJnTJ8SJKkThk+JElSpwwfkiSpU4YPSZLUKcOHJEnqlOFDkiR1yvAhSZI6ZfiQJEmdMnxIkqROGT4kSVKnDB+SJKlThg9JktQpw4ckSeqU4UOSJHXK8KFJk+TiJM+YgnG/kORDkz2uJGl6rD3dBej+o6q2m+4aJEmrP1c+JElSpwwfmjRJFiZ5VpIdk/QnWZ7kuiSfGEPfbyS5NsmyJD9NMuwqSpI3Jrk0yZIk300yp2dfJTkgyR+T3Jzk00nSs3+/JJe0+36QZN4Ix3lTO4/+G264YTynQpI0AsOHpsLRwNFV9UDgkcDJY+hzOrAN8GDgV8BXhmqU5B+BjwB7Ag8DrgC+PqjZC4AnAX/Xtntu23cP4FDgJcAWwNnA14YrqKqOq6q+qurbYostxjAFSdJYGD40Fe4Ctk6yeVWtrKrzRutQVZ+rqhVVdQdwGPB3SWYP0fTVwOeq6ldt238BnpJkfk+bI6pqaVVdCfwE2KHdvj/wkaq6pKruBj4M7DDS6ockafIZPjQVXg9sC/w+yflJXjBS4ySzkhyR5LIky4GF7a7Nh2g+h2a1A4CqWgncBMztaXNtz/tbgY3a9/OAo5MsTbIUWAJkUF9J0hTz2y6adFX1R+BVSdaiucXxzSQPqqpbhumyF/Ai4Fk0wWM2cDNNMBjsapoQAUCSDYEHAYvHUNoi4PCqGvKWjiSpG658aNIleU2SLarqHmBpu3nVCF02Bu6gWcHYgOZ2yHC+CuybZIck67Ztf1FVC8dQ2rHAvww8zJpkdpKXj6GfJGkSGT40FZ4HXJxkJc3Dp6+sqttHaP8lmlspi4HfAcM+I1JV/wv8G/At4BqaB1pfOZaiquoU4KPA19vbO78Fdh1LX0nS5ElVTXcN0mqvr6+v+vv7p7sMSZpRkiyoqr7B2135kCRJnTJ8qBNJXp1k5RCvi6e7NklSt/y2izrRfsPEb5lIklz5kCRJ3TJ8SJKkThk+JElSpwwfkiSpU4YPSZLUKcOHJEnqlOFDkiR1yvAhSZI6ZfiQJEmdMnxIkqROGT4kSVKnDB+SJKlThg9JktQpw4ckSeqU4UOSJHXK8CFJkjpl+JAkSZ0yfEiSpE4ZPiRJUqcMH5IkqVOGD0mS1CnDhyRJ6pThQ5IkdcrwoRElWZjkWdNdx3CS7JPkZyPsPz3J67qsSZI0srWnuwBpKlXVrtNdgyTp3lz5kCRJnTJ8aCx2SPKbJMuSnJRkvSSbJvl+khuS3Ny+f/hAh/Z2yJ+SrEhyeZJXj3aQJG9Mcknb53dJntBuPyTJZT3bX/zXXfMfbX2/T7JLz44zk7yhp6afJfl4W/PlSYZdGUnypiT9SfpvuOGGcZ80SdLQDB8aiz2B5wGPAB4H7EPzZ+fzwDxgK+A24BiAJBsCnwJ2raqNgacCF450gCQvBw4D9gYeCLwQuKndfRnwNGA28H7gy0ke1tN9J+BPwObA+4D/TrLZMIfaCfhD2/bfgROSZKiGVXVcVfVVVd8WW2wxUvmSpHEwfGgsPlVVV1fVEuB7wA5VdVNVfauqbq2qFcDhwM49fe4BHptk/aq6pqouHuUYbwD+varOr8alVXUFQFV9oz3+PVV1EvBHYMeevtcDR1XVXe3+PwC7DXOcK6rq+KpaBXwReBjwkPGdDknSfWH40Fhc2/P+VmCjJBsk+UySK5IsB34KbJJkVlXdArwCOAC4JsmpSR49yjG2pFnh+CtJ9k5yYZKlSZYCj6VZuRiwuKqq5+crgDmjzaWqbm3fbjRKbZKkSWT40ES9HXgUsFNVPRB4ers9AFX1g6p6Ns3Kwu+B40cZbxHwyMEbk8xr+74VeFBVbQL8duA4rbmDbp1sBVw93glJkrph+NBEbUzznMfS9vmK9w3sSPKQJC9sn/24A1gJrBplvM8C70jyxDS2boPHhkABN7Rj70uz8tHrwcCBSR7QPjvyGOC0+z5FSdJUMHxooo4C1gduBM4D/qdn31o0KyNXA0tongV5y0iDVdU3aJ4b+SqwAvg2sFlV/Q44EjgXuA7YHvj5oO6/ALZpazkceFlV3YQkabWUe98qlzSUvr6+6u/vn+4yJGlGSbKgqvoGb3flQ5Ikdcrwoc4kOTbJyiFex053bZKk7vjfdlFnquoAmq/fSpLWYK58SJKkThk+JElSpwwfkiSpU4YPSZLUKcOHJEnqlOFDkiR1yvAhSZI6ZfiQJEmdMnxIkqROGT4kSVKnDB+SJKlThg9JktQpw4ckSeqU4UOSJHXK8CFJkjqVqpruGqTVXpIVwB+mu45Jsjlw43QXMYmcz+rN+ay+upjLvKraYvDGtaf4oNL9xR+qqm+6i5gMSfrvL3MB57O6cz6rr+mci7ddJElSpwwfkiSpU4YPaWyOm+4CJtH9aS7gfFZ3zmf1NW1z8YFTSZLUKVc+JElSpwwfkiSpU4YPrZGSbJbklCS3JLkiyV4jtH1bkmuTLEvyuSTrTmScqTSJ8zkzye1JVravafndJmOdT5LHJvlBkhuT/NU95NXh+kziXGbatXldkgVJlie5Ksm/J1l7vONMtUmcz0y7Pq9M8of274Hrk3wxyQPHO85EGT60pvo0cCfwEODVwH8l2W5woyTPBQ4BdgHmA38DvH+843RgsuYD8Naq2qh9PWpKqx7eWM/rXcDJwOvv4zhTabLmAjPr2mwAHEzzi6x2ovkz944JjDPVJms+MLOuz8+Bv6+q2TR/D6wNfGgC40xMVfnytUa9gA3b/1Nt27PtROCIIdp+Ffhwz8+7ANeOd5yZMJ/25zOBN8yU69Ozf+vmr7P7Ns7qOpeZfG162v0/4Hury7WZzPnM9OsDbAR8CTitq+vjyofWRNsCq6rq/3q2/RoYKtVv1+7rbfeQJA8a5zhTabLmM+Aj7dL/z5M8Y7KLHYPJOq+rw/WZ7Bpm8rV5OnDxJIwzmSZrPgNm1PVJ8g9JlgErgJcCR01knIkwfGhNtBGwbNC2ZcDGY2g78H7jcY4zlSZrPgDvplmCnUvzOwC+l+SRk1fqmEzWeV0drs9k1jBjr02SfYE+4OP3ZZwpMFnzgRl4farqZ9Xcdnk48DFg4UTGmQjDh9ZEK4EHDtr2QJr0P1rbgfcrxjnOVJqs+VBVv6iqFVV1R1V9kea+8PMnud7RTNZ5XR2uz6TVMFOvTZI9gCOAXatq4D9itjpcmwnVMcx8Zuz1AaiqxcD/AF+/L+OMh+FDa6L/A9ZOsk3Ptr/jr5dQabf93aB211XVTeMcZypN1nyGUkAmpcqxm6zzujpcn6msYbW/NkmeBxwP7F5VF010nCk0WfMZymp/fQZZGxhYqZn66zOdD8f48jVdL5qE/zWaB6v+nmZJcbsh2j0PuBb4W2BT4Mf0PHQ11nFmwnyATYDnAuvR/EX0auAW4FGr8XzS1vu3NH/Zrwesuzpdn8mYywy9Nv8I3AQ8/b6MMxPmM0Ovz6uBrdo/d/OAs4D/7ur6dHpSfPlaXV7AZsC3278grgT2ardvRbPkuFVP2/8HXAcsBz4/6MNtyHFm4nyALYDzaZZWlwLnAc9enedD83XhGvRauDpdn8mYywy9Nj8B7m63DbxOX52uzWTNZ4Zen8OBq9p2V9E8p/Kgrq6P/20XSZLUKZ/5kCRJnTJ8SJKkThk+JElSpwwfkiSpU4YPSZLUKcOHJEnqlOFDkiR1yvAhSZI6ZfiQJEmd+v+hE+TYKKVG9AAAAABJRU5ErkJggg==",
      "text/plain": [
       "<Figure size 432x648 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.figure(figsize=(6, 9))\n",
    "\n",
    "ind = np.argsort(xgb_model.feature_importances_)[::-1]\n",
    "features_sorted = np.array(features)[ind]\n",
    "importances_sorted = xgb_model.feature_importances_[ind]\n",
    "\n",
    "plt.barh(y=range(len(features)), width=importances_sorted, height=0.2)\n",
    "plt.title('Gain')\n",
    "plt.yticks(ticks=range(len(features)), labels=features_sorted)\n",
    "plt.gca().invert_yaxis()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Modeling (part 2): Linear models & Ensembles\n",
    "\n",
    "Given the randomness of the _Titanic dataset_ , we can be satisfied with the performance of `xgboost` model above. Still, it is always usefull to try a variety of models and approaches, especially since `vaex` makes makes this process rather simple. \n",
    "\n",
    "In the following part we will use a couple of linear models as our predictors, this time straight from `scikit-learn`. This requires us to pre-process the data in a slightly different way.\n",
    "\n",
    "### Feature pre-processing for linear models\n",
    "\n",
    "When using linear models, the safest option is to encode categorical variables with the one-hot encoding scheme, especially if they have low cardinality. We will do this for the \"family_size\" and \"deck\" features. Note that the \"sex\" feature is already encoded since it has only unique values options. \n",
    "\n",
    "The \"name_title\" feature is a bit more tricky. Since in its original form it has some values that only appear a couple of times, we will do a trick: we will one-hot encode the frequency encoded values. This will reduce cardinality of the feature, while also preserving the most important, i.e. most common values.\n",
    "\n",
    "Regarding the \"age\" and \"fare\", to add some variance in the model, we will not convert them to categorical as before, but simply remove their mean and standard-deviations (standard-scaling). We will do the same to the \"fare_per_family_member\" feature.\n",
    "\n",
    "\n",
    "Finally, we will drop out any other features."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:41.979030Z",
     "start_time": "2020-05-01T17:12:41.922481Z"
    }
   },
   "outputs": [],
   "source": [
    "# One-hot encode categorical features\n",
    "one_hot = vaex.ml.OneHotEncoder(features=['deck', 'family_size', 'name_title'])\n",
    "df_train = one_hot.fit_transform(df_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:42.072684Z",
     "start_time": "2020-05-01T17:12:41.988593Z"
    }
   },
   "outputs": [],
   "source": [
    "# Standard scale numerical features\n",
    "standard_scaler = vaex.ml.StandardScaler(features=['age', 'fare', 'fare_per_family_member'])\n",
    "df_train = standard_scaler.fit_transform(df_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:42.088401Z",
     "start_time": "2020-05-01T17:12:42.076102Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['deck_A',\n",
       " 'deck_B',\n",
       " 'deck_C',\n",
       " 'deck_D',\n",
       " 'deck_E',\n",
       " 'deck_F',\n",
       " 'deck_G',\n",
       " 'deck_M',\n",
       " 'family_size_1',\n",
       " 'family_size_2',\n",
       " 'family_size_3',\n",
       " 'family_size_4',\n",
       " 'family_size_5',\n",
       " 'family_size_6',\n",
       " 'family_size_7',\n",
       " 'family_size_8',\n",
       " 'family_size_11',\n",
       " 'standard_scaled_age',\n",
       " 'standard_scaled_fare',\n",
       " 'standard_scaled_fare_per_family_member',\n",
       " 'label_encoded_sex']"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Get the features for training a linear model\n",
    "features_linear = df_train.get_column_names(regex='^deck_|^family_size_|^frequency_encoded_name_title_')\n",
    "features_linear += df_train.get_column_names(regex='^standard_scaled_')\n",
    "features_linear += ['label_encoded_sex']\n",
    "features_linear"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Estimators: `SVC` and `LogisticRegression`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:42.170145Z",
     "start_time": "2020-05-01T17:12:42.095159Z"
    }
   },
   "outputs": [],
   "source": [
    "from sklearn.svm import SVC\n",
    "from sklearn.linear_model import LogisticRegression"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:42.646357Z",
     "start_time": "2020-05-01T17:12:42.172042Z"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/jovan/miniconda3/lib/python3.7/site-packages/sklearn/svm/_base.py:258: ConvergenceWarning: Solver terminated early (max_iter=1000).  Consider pre-processing your data with StandardScaler or MinMaxScaler.\n",
      "  % self.max_iter, ConvergenceWarning)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table>\n",
       "<thead>\n",
       "<tr><th>#                            </th><th style=\"text-align: right;\">  pclass</th><th>survived  </th><th>name                            </th><th>sex   </th><th style=\"text-align: right;\">  age</th><th style=\"text-align: right;\">  sibsp</th><th style=\"text-align: right;\">  parch</th><th>ticket   </th><th style=\"text-align: right;\">   fare</th><th>cabin  </th><th>embarked  </th><th>boat  </th><th style=\"text-align: right;\">  body</th><th>home_dest           </th><th>name_title  </th><th style=\"text-align: right;\">  name_num_words</th><th>deck  </th><th style=\"text-align: right;\">  multi_cabin</th><th style=\"text-align: right;\">  has_cabin</th><th style=\"text-align: right;\">  family_size</th><th style=\"text-align: right;\">  is_alone</th><th style=\"text-align: right;\">  age_times_class</th><th style=\"text-align: right;\">  fare_per_family_member</th><th style=\"text-align: right;\">  label_encoded_sex</th><th style=\"text-align: right;\">  label_encoded_embarked</th><th style=\"text-align: right;\">  label_encoded_deck</th><th style=\"text-align: right;\">  frequency_encoded_name_title</th><th style=\"text-align: right;\">  prediction_xgb</th><th style=\"text-align: right;\">  deck_A</th><th style=\"text-align: right;\">  deck_B</th><th style=\"text-align: right;\">  deck_C</th><th style=\"text-align: right;\">  deck_D</th><th style=\"text-align: right;\">  deck_E</th><th style=\"text-align: right;\">  deck_F</th><th style=\"text-align: right;\">  deck_G</th><th style=\"text-align: right;\">  deck_M</th><th style=\"text-align: right;\">  family_size_1</th><th style=\"text-align: right;\">  family_size_2</th><th style=\"text-align: right;\">  family_size_3</th><th style=\"text-align: right;\">  family_size_4</th><th style=\"text-align: right;\">  family_size_5</th><th style=\"text-align: right;\">  family_size_6</th><th style=\"text-align: right;\">  family_size_7</th><th style=\"text-align: right;\">  family_size_8</th><th style=\"text-align: right;\">  family_size_11</th><th style=\"text-align: right;\">  name_title_Capt</th><th style=\"text-align: right;\">  name_title_Col</th><th style=\"text-align: right;\">  name_title_Countess</th><th style=\"text-align: right;\">  name_title_Don</th><th style=\"text-align: right;\">  name_title_Dona</th><th style=\"text-align: right;\">  name_title_Dr</th><th style=\"text-align: right;\">  name_title_Jonkheer</th><th style=\"text-align: right;\">  name_title_Lady</th><th style=\"text-align: right;\">  name_title_Major</th><th style=\"text-align: right;\">  name_title_Master</th><th style=\"text-align: right;\">  name_title_Miss</th><th style=\"text-align: right;\">  name_title_Mlle</th><th style=\"text-align: right;\">  name_title_Mme</th><th style=\"text-align: right;\">  name_title_Mr</th><th style=\"text-align: right;\">  name_title_Mrs</th><th style=\"text-align: right;\">  name_title_Ms</th><th style=\"text-align: right;\">  name_title_Rev</th><th style=\"text-align: right;\">  standard_scaled_age</th><th style=\"text-align: right;\">  standard_scaled_fare</th><th style=\"text-align: right;\">  standard_scaled_fare_per_family_member</th><th>prediction_svc  </th><th>prediction_lr  </th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr><td><i style='opacity: 0.6'>0</i></td><td style=\"text-align: right;\">       3</td><td>False     </td><td>Stoytcheff, Mr. Ilia            </td><td>male  </td><td style=\"text-align: right;\">   19</td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      0</td><td>349205   </td><td style=\"text-align: right;\"> 7.8958</td><td>M      </td><td>S         </td><td>--    </td><td style=\"text-align: right;\">   nan</td><td>--                  </td><td>Mr          </td><td style=\"text-align: right;\">               3</td><td>M     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">               57</td><td style=\"text-align: right;\">                  7.8958</td><td style=\"text-align: right;\">                  1</td><td style=\"text-align: right;\">                       1</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.578797</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       1</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                 0</td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">            -0.807704</td><td style=\"text-align: right;\">             -0.493719</td><td style=\"text-align: right;\">                               -0.342804</td><td>False           </td><td>False          </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1</i></td><td style=\"text-align: right;\">       1</td><td>False     </td><td>Payne, Mr. Vivian Ponsonby      </td><td>male  </td><td style=\"text-align: right;\">   23</td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      0</td><td>12749    </td><td style=\"text-align: right;\">93.5   </td><td>B24    </td><td>S         </td><td>--    </td><td style=\"text-align: right;\">   nan</td><td>Montreal, PQ        </td><td>Mr          </td><td style=\"text-align: right;\">               4</td><td>B     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">               23</td><td style=\"text-align: right;\">                 93.5   </td><td style=\"text-align: right;\">                  1</td><td style=\"text-align: right;\">                       1</td><td style=\"text-align: right;\">                   1</td><td style=\"text-align: right;\">                      0.578797</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       1</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                 0</td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">            -0.492921</td><td style=\"text-align: right;\">              1.19613 </td><td style=\"text-align: right;\">                                1.99718 </td><td>False           </td><td>True           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>2</i></td><td style=\"text-align: right;\">       3</td><td>True      </td><td>Abbott, Mrs. Stanton (Rosa Hunt)</td><td>female</td><td style=\"text-align: right;\">   35</td><td style=\"text-align: right;\">      1</td><td style=\"text-align: right;\">      1</td><td>C.A. 2673</td><td style=\"text-align: right;\">20.25  </td><td>M      </td><td>S         </td><td>A     </td><td style=\"text-align: right;\">   nan</td><td>East Providence, RI </td><td>Mrs         </td><td style=\"text-align: right;\">               5</td><td>M     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            3</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">              105</td><td style=\"text-align: right;\">                  6.75  </td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                       1</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.145177</td><td style=\"text-align: right;\">               1</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                 0</td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">             0.45143 </td><td style=\"text-align: right;\">             -0.249845</td><td style=\"text-align: right;\">                               -0.374124</td><td>True            </td><td>True           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>3</i></td><td style=\"text-align: right;\">       2</td><td>True      </td><td>Hocking, Miss. Ellen &quot;Nellie&quot;   </td><td>female</td><td style=\"text-align: right;\">   20</td><td style=\"text-align: right;\">      2</td><td style=\"text-align: right;\">      1</td><td>29105    </td><td style=\"text-align: right;\">23     </td><td>M      </td><td>S         </td><td>4     </td><td style=\"text-align: right;\">   nan</td><td>Cornwall / Akron, OH</td><td>Miss        </td><td style=\"text-align: right;\">               4</td><td>M     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            4</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">               40</td><td style=\"text-align: right;\">                  5.75  </td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                       1</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.201528</td><td style=\"text-align: right;\">               1</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                 0</td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                1</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">            -0.729008</td><td style=\"text-align: right;\">             -0.195559</td><td style=\"text-align: right;\">                               -0.401459</td><td>True            </td><td>True           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>4</i></td><td style=\"text-align: right;\">       3</td><td>False     </td><td>Nilsson, Mr. August Ferdinand   </td><td>male  </td><td style=\"text-align: right;\">   21</td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      0</td><td>350410   </td><td style=\"text-align: right;\"> 7.8542</td><td>M      </td><td>S         </td><td>--    </td><td style=\"text-align: right;\">   nan</td><td>--                  </td><td>Mr          </td><td style=\"text-align: right;\">               4</td><td>M     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">               63</td><td style=\"text-align: right;\">                  7.8542</td><td style=\"text-align: right;\">                  1</td><td style=\"text-align: right;\">                       1</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.578797</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       1</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                 0</td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">            -0.650312</td><td style=\"text-align: right;\">             -0.494541</td><td style=\"text-align: right;\">                               -0.343941</td><td>False           </td><td>False          </td></tr>\n",
       "</tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "  #    pclass  survived    name                              sex       age    sibsp    parch  ticket        fare  cabin    embarked    boat      body  home_dest             name_title      name_num_words  deck      multi_cabin    has_cabin    family_size    is_alone    age_times_class    fare_per_family_member    label_encoded_sex    label_encoded_embarked    label_encoded_deck    frequency_encoded_name_title    prediction_xgb    deck_A    deck_B    deck_C    deck_D    deck_E    deck_F    deck_G    deck_M    family_size_1    family_size_2    family_size_3    family_size_4    family_size_5    family_size_6    family_size_7    family_size_8    family_size_11    name_title_Capt    name_title_Col    name_title_Countess    name_title_Don    name_title_Dona    name_title_Dr    name_title_Jonkheer    name_title_Lady    name_title_Major    name_title_Master    name_title_Miss    name_title_Mlle    name_title_Mme    name_title_Mr    name_title_Mrs    name_title_Ms    name_title_Rev    standard_scaled_age    standard_scaled_fare    standard_scaled_fare_per_family_member  prediction_svc    prediction_lr\n",
       "  0         3  False       Stoytcheff, Mr. Ilia              male       19        0        0  349205      7.8958  M        S           --         nan  --                    Mr                           3  M                   0            1              1           0                 57                    7.8958                    1                         1                     0                        0.578797                 0         0         0         0         0         0         0         0         1                1                0                0                0                0                0                0                0                 0                  0                 0                      0                 0                  0                0                      0                  0                   0                    0                  0                  0                 0                1                 0                0                 0              -0.807704               -0.493719                                 -0.342804  False             False\n",
       "  1         1  False       Payne, Mr. Vivian Ponsonby        male       23        0        0  12749      93.5     B24      S           --         nan  Montreal, PQ          Mr                           4  B                   0            1              1           0                 23                   93.5                       1                         1                     1                        0.578797                 0         0         1         0         0         0         0         0         0                1                0                0                0                0                0                0                0                 0                  0                 0                      0                 0                  0                0                      0                  0                   0                    0                  0                  0                 0                1                 0                0                 0              -0.492921                1.19613                                   1.99718   False             True\n",
       "  2         3  True        Abbott, Mrs. Stanton (Rosa Hunt)  female     35        1        1  C.A. 2673  20.25    M        S           A          nan  East Providence, RI   Mrs                          5  M                   0            1              3           0                105                    6.75                      0                         1                     0                        0.145177                 1         0         0         0         0         0         0         0         1                0                0                1                0                0                0                0                0                 0                  0                 0                      0                 0                  0                0                      0                  0                   0                    0                  0                  0                 0                0                 1                0                 0               0.45143                -0.249845                                 -0.374124  True              True\n",
       "  3         2  True        Hocking, Miss. Ellen \"Nellie\"     female     20        2        1  29105      23       M        S           4          nan  Cornwall / Akron, OH  Miss                         4  M                   0            1              4           0                 40                    5.75                      0                         1                     0                        0.201528                 1         0         0         0         0         0         0         0         1                0                0                0                1                0                0                0                0                 0                  0                 0                      0                 0                  0                0                      0                  0                   0                    0                  1                  0                 0                0                 0                0                 0              -0.729008               -0.195559                                 -0.401459  True              True\n",
       "  4         3  False       Nilsson, Mr. August Ferdinand     male       21        0        0  350410      7.8542  M        S           --         nan  --                    Mr                           4  M                   0            1              1           0                 63                    7.8542                    1                         1                     0                        0.578797                 0         0         0         0         0         0         0         0         1                1                0                0                0                0                0                0                0                 0                  0                 0                      0                 0                  0                0                      0                  0                   0                    0                  0                  0                 0                1                 0                0                 0              -0.650312               -0.494541                                 -0.343941  False             False"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# The Support Vector Classifier\n",
    "vaex_svc = vaex.ml.sklearn.Predictor(features=features_linear, \n",
    "                                     target='survived',\n",
    "                                     model=SVC(max_iter=1000, random_state=42),\n",
    "                                     prediction_name='prediction_svc')\n",
    "\n",
    "# Logistic Regression\n",
    "vaex_logistic = vaex.ml.sklearn.Predictor(features=features_linear, \n",
    "                                          target='survived',\n",
    "                                          model=LogisticRegression(max_iter=1000, random_state=42),\n",
    "                                          prediction_name='prediction_lr')\n",
    "\n",
    "# Train the new models and apply the transformation to the train dataframe\n",
    "for model in [vaex_svc, vaex_logistic]:\n",
    "    model.fit(df_train)\n",
    "    df_train = model.transform(df_train)\n",
    "    \n",
    "# Preview of the train DataFrame\n",
    "df_train.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Ensemble\n",
    "\n",
    "Just as before, the predictions from the `SVC` and the `LogisticRegression` classifiers are added as virtual columns in the training dataset. This is quite powerful, since now we can easily use them to create an ensemble! For example, let's do a weighted mean."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:42.958447Z",
     "start_time": "2020-05-01T17:12:42.653715Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table>\n",
       "<thead>\n",
       "<tr><th>#                                </th><th>prediction_xgb  </th><th>prediction_svc  </th><th>prediction_lr  </th><th>prediction_final  </th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr><td><i style='opacity: 0.6'>0</i>    </td><td>0               </td><td>False           </td><td>False          </td><td>False             </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1</i>    </td><td>0               </td><td>False           </td><td>True           </td><td>False             </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>2</i>    </td><td>1               </td><td>True            </td><td>True           </td><td>True              </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>3</i>    </td><td>1               </td><td>True            </td><td>True           </td><td>True              </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>4</i>    </td><td>0               </td><td>False           </td><td>False          </td><td>False             </td></tr>\n",
       "<tr><td>...                              </td><td>...             </td><td>...             </td><td>...            </td><td>...               </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,042</i></td><td>0               </td><td>False           </td><td>False          </td><td>False             </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,043</i></td><td>0               </td><td>True            </td><td>True           </td><td>True              </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,044</i></td><td>1               </td><td>True            </td><td>False          </td><td>True              </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,045</i></td><td>0               </td><td>True            </td><td>True           </td><td>True              </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,046</i></td><td>0               </td><td>False           </td><td>False          </td><td>False             </td></tr>\n",
       "</tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "#      prediction_xgb    prediction_svc    prediction_lr    prediction_final\n",
       "0      0                 False             False            False\n",
       "1      0                 False             True             False\n",
       "2      1                 True              True             True\n",
       "3      1                 True              True             True\n",
       "4      0                 False             False            False\n",
       "...    ...               ...               ...              ...\n",
       "1,042  0                 False             False            False\n",
       "1,043  0                 True              True             True\n",
       "1,044  1                 True              False            True\n",
       "1,045  0                 True              True             True\n",
       "1,046  0                 False             False            False"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Weighed mean of the classes\n",
    "prediction_final = (df_train.prediction_xgb.astype('int') * 0.3 + \n",
    "                    df_train.prediction_svc.astype('int') * 0.5 + \n",
    "                    df_train.prediction_xgb.astype('int') * 0.2)\n",
    "# Get the predicted class\n",
    "prediction_final = (prediction_final >= 0.5)\n",
    "# Add the expression to the train DataFrame\n",
    "df_train['prediction_final'] = prediction_final\n",
    "\n",
    "# Preview\n",
    "df_train[df_train.get_column_names(regex='^predict')]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Performance (part 2)\n",
    "\n",
    "Applying the ensembler to the test set is just as easy as before. We just need to get the new state of the training DataFrame, and transfer it to the test DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:43.334411Z",
     "start_time": "2020-05-01T17:12:42.961373Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table>\n",
       "<thead>\n",
       "<tr><th>#                            </th><th style=\"text-align: right;\">  pclass</th><th>survived  </th><th>name                                        </th><th>sex   </th><th style=\"text-align: right;\">   age</th><th style=\"text-align: right;\">  sibsp</th><th style=\"text-align: right;\">  parch</th><th>ticket          </th><th style=\"text-align: right;\">   fare</th><th>cabin  </th><th>embarked  </th><th>boat  </th><th style=\"text-align: right;\">  body</th><th>home_dest               </th><th>name_title  </th><th style=\"text-align: right;\">  name_num_words</th><th>deck  </th><th style=\"text-align: right;\">  multi_cabin</th><th style=\"text-align: right;\">  has_cabin</th><th style=\"text-align: right;\">  family_size</th><th style=\"text-align: right;\">  is_alone</th><th style=\"text-align: right;\">  age_times_class</th><th style=\"text-align: right;\">  fare_per_family_member</th><th style=\"text-align: right;\">  label_encoded_sex</th><th style=\"text-align: right;\">  label_encoded_embarked</th><th style=\"text-align: right;\">  label_encoded_deck</th><th style=\"text-align: right;\">  frequency_encoded_name_title</th><th style=\"text-align: right;\">  prediction_xgb</th><th style=\"text-align: right;\">  deck_A</th><th style=\"text-align: right;\">  deck_B</th><th style=\"text-align: right;\">  deck_C</th><th style=\"text-align: right;\">  deck_D</th><th style=\"text-align: right;\">  deck_E</th><th style=\"text-align: right;\">  deck_F</th><th style=\"text-align: right;\">  deck_G</th><th style=\"text-align: right;\">  deck_M</th><th style=\"text-align: right;\">  family_size_1</th><th style=\"text-align: right;\">  family_size_2</th><th style=\"text-align: right;\">  family_size_3</th><th style=\"text-align: right;\">  family_size_4</th><th style=\"text-align: right;\">  family_size_5</th><th style=\"text-align: right;\">  family_size_6</th><th style=\"text-align: right;\">  family_size_7</th><th style=\"text-align: right;\">  family_size_8</th><th style=\"text-align: right;\">  family_size_11</th><th style=\"text-align: right;\">  name_title_Capt</th><th style=\"text-align: right;\">  name_title_Col</th><th style=\"text-align: right;\">  name_title_Countess</th><th style=\"text-align: right;\">  name_title_Don</th><th style=\"text-align: right;\">  name_title_Dona</th><th style=\"text-align: right;\">  name_title_Dr</th><th style=\"text-align: right;\">  name_title_Jonkheer</th><th style=\"text-align: right;\">  name_title_Lady</th><th style=\"text-align: right;\">  name_title_Major</th><th style=\"text-align: right;\">  name_title_Master</th><th style=\"text-align: right;\">  name_title_Miss</th><th style=\"text-align: right;\">  name_title_Mlle</th><th style=\"text-align: right;\">  name_title_Mme</th><th style=\"text-align: right;\">  name_title_Mr</th><th style=\"text-align: right;\">  name_title_Mrs</th><th style=\"text-align: right;\">  name_title_Ms</th><th style=\"text-align: right;\">  name_title_Rev</th><th style=\"text-align: right;\">  standard_scaled_age</th><th style=\"text-align: right;\">  standard_scaled_fare</th><th style=\"text-align: right;\">  standard_scaled_fare_per_family_member</th><th>prediction_svc  </th><th>prediction_lr  </th><th>prediction_final  </th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr><td><i style='opacity: 0.6'>0</i></td><td style=\"text-align: right;\">       3</td><td>False     </td><td>O&#x27;Connor, Mr. Patrick                       </td><td>male  </td><td style=\"text-align: right;\">28.032</td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      0</td><td>366713          </td><td style=\"text-align: right;\"> 7.75  </td><td>M      </td><td>Q         </td><td>--    </td><td style=\"text-align: right;\">   nan</td><td>--                      </td><td>Mr          </td><td style=\"text-align: right;\">               3</td><td>M     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">           84.096</td><td style=\"text-align: right;\">                  7.75  </td><td style=\"text-align: right;\">                  1</td><td style=\"text-align: right;\">                       2</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.578797</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       1</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                 0</td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">           -0.096924 </td><td style=\"text-align: right;\">             -0.496597</td><td style=\"text-align: right;\">                               -0.346789</td><td>False           </td><td>False          </td><td>False             </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1</i></td><td style=\"text-align: right;\">       3</td><td>False     </td><td>Canavan, Mr. Patrick                        </td><td>male  </td><td style=\"text-align: right;\">21    </td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      0</td><td>364858          </td><td style=\"text-align: right;\"> 7.75  </td><td>M      </td><td>Q         </td><td>--    </td><td style=\"text-align: right;\">   nan</td><td>Ireland Philadelphia, PA</td><td>Mr          </td><td style=\"text-align: right;\">               3</td><td>M     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">           63    </td><td style=\"text-align: right;\">                  7.75  </td><td style=\"text-align: right;\">                  1</td><td style=\"text-align: right;\">                       2</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.578797</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       1</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                 0</td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">           -0.650312 </td><td style=\"text-align: right;\">             -0.496597</td><td style=\"text-align: right;\">                               -0.346789</td><td>False           </td><td>False          </td><td>False             </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>2</i></td><td style=\"text-align: right;\">       1</td><td>False     </td><td>Ovies y Rodriguez, Mr. Servando             </td><td>male  </td><td style=\"text-align: right;\">28.5  </td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      0</td><td>PC 17562        </td><td style=\"text-align: right;\">27.7208</td><td>D43    </td><td>C         </td><td>--    </td><td style=\"text-align: right;\">   189</td><td>?Havana, Cuba           </td><td>Mr          </td><td style=\"text-align: right;\">               5</td><td>D     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">           28.5  </td><td style=\"text-align: right;\">                 27.7208</td><td style=\"text-align: right;\">                  1</td><td style=\"text-align: right;\">                       0</td><td style=\"text-align: right;\">                   4</td><td style=\"text-align: right;\">                      0.578797</td><td style=\"text-align: right;\">               1</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       1</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                 0</td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">           -0.0600935</td><td style=\"text-align: right;\">             -0.102369</td><td style=\"text-align: right;\">                                0.19911 </td><td>False           </td><td>False          </td><td>True              </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>3</i></td><td style=\"text-align: right;\">       3</td><td>False     </td><td>Windelov, Mr. Einar                         </td><td>male  </td><td style=\"text-align: right;\">21    </td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      0</td><td>SOTON/OQ 3101317</td><td style=\"text-align: right;\"> 7.25  </td><td>M      </td><td>S         </td><td>--    </td><td style=\"text-align: right;\">   nan</td><td>--                      </td><td>Mr          </td><td style=\"text-align: right;\">               3</td><td>M     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">           63    </td><td style=\"text-align: right;\">                  7.25  </td><td style=\"text-align: right;\">                  1</td><td style=\"text-align: right;\">                       1</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.578797</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       1</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                 0</td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">           -0.650312 </td><td style=\"text-align: right;\">             -0.506468</td><td style=\"text-align: right;\">                               -0.360456</td><td>False           </td><td>False          </td><td>False             </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>4</i></td><td style=\"text-align: right;\">       2</td><td>True      </td><td>Shelley, Mrs. William (Imanita Parrish Hall)</td><td>female</td><td style=\"text-align: right;\">25    </td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      1</td><td>230433          </td><td style=\"text-align: right;\">26     </td><td>M      </td><td>S         </td><td>12    </td><td style=\"text-align: right;\">   nan</td><td>Deer Lodge, MT          </td><td>Mrs         </td><td style=\"text-align: right;\">               6</td><td>M     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            2</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">           50    </td><td style=\"text-align: right;\">                 13     </td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                       1</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.145177</td><td style=\"text-align: right;\">               1</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                 0</td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">           -0.335529 </td><td style=\"text-align: right;\">             -0.136338</td><td style=\"text-align: right;\">                               -0.203281</td><td>True            </td><td>True           </td><td>True              </td></tr>\n",
       "</tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "  #    pclass  survived    name                                          sex        age    sibsp    parch  ticket               fare  cabin    embarked    boat      body  home_dest                 name_title      name_num_words  deck      multi_cabin    has_cabin    family_size    is_alone    age_times_class    fare_per_family_member    label_encoded_sex    label_encoded_embarked    label_encoded_deck    frequency_encoded_name_title    prediction_xgb    deck_A    deck_B    deck_C    deck_D    deck_E    deck_F    deck_G    deck_M    family_size_1    family_size_2    family_size_3    family_size_4    family_size_5    family_size_6    family_size_7    family_size_8    family_size_11    name_title_Capt    name_title_Col    name_title_Countess    name_title_Don    name_title_Dona    name_title_Dr    name_title_Jonkheer    name_title_Lady    name_title_Major    name_title_Master    name_title_Miss    name_title_Mlle    name_title_Mme    name_title_Mr    name_title_Mrs    name_title_Ms    name_title_Rev    standard_scaled_age    standard_scaled_fare    standard_scaled_fare_per_family_member  prediction_svc    prediction_lr    prediction_final\n",
       "  0         3  False       O'Connor, Mr. Patrick                         male    28.032        0        0  366713             7.75    M        Q           --         nan  --                        Mr                           3  M                   0            1              1           0             84.096                    7.75                      1                         2                     0                        0.578797                 0         0         0         0         0         0         0         0         1                1                0                0                0                0                0                0                0                 0                  0                 0                      0                 0                  0                0                      0                  0                   0                    0                  0                  0                 0                1                 0                0                 0             -0.096924                -0.496597                                 -0.346789  False             False            False\n",
       "  1         3  False       Canavan, Mr. Patrick                          male    21            0        0  364858             7.75    M        Q           --         nan  Ireland Philadelphia, PA  Mr                           3  M                   0            1              1           0             63                        7.75                      1                         2                     0                        0.578797                 0         0         0         0         0         0         0         0         1                1                0                0                0                0                0                0                0                 0                  0                 0                      0                 0                  0                0                      0                  0                   0                    0                  0                  0                 0                1                 0                0                 0             -0.650312                -0.496597                                 -0.346789  False             False            False\n",
       "  2         1  False       Ovies y Rodriguez, Mr. Servando               male    28.5          0        0  PC 17562          27.7208  D43      C           --         189  ?Havana, Cuba             Mr                           5  D                   0            1              1           0             28.5                     27.7208                    1                         0                     4                        0.578797                 1         0         0         0         1         0         0         0         0                1                0                0                0                0                0                0                0                 0                  0                 0                      0                 0                  0                0                      0                  0                   0                    0                  0                  0                 0                1                 0                0                 0             -0.0600935               -0.102369                                  0.19911   False             False            True\n",
       "  3         3  False       Windelov, Mr. Einar                           male    21            0        0  SOTON/OQ 3101317   7.25    M        S           --         nan  --                        Mr                           3  M                   0            1              1           0             63                        7.25                      1                         1                     0                        0.578797                 0         0         0         0         0         0         0         0         1                1                0                0                0                0                0                0                0                 0                  0                 0                      0                 0                  0                0                      0                  0                   0                    0                  0                  0                 0                1                 0                0                 0             -0.650312                -0.506468                                 -0.360456  False             False            False\n",
       "  4         2  True        Shelley, Mrs. William (Imanita Parrish Hall)  female  25            0        1  230433            26       M        S           12         nan  Deer Lodge, MT            Mrs                          6  M                   0            1              2           0             50                       13                         0                         1                     0                        0.145177                 1         0         0         0         0         0         0         0         1                0                1                0                0                0                0                0                0                 0                  0                 0                      0                 0                  0                0                      0                  0                   0                    0                  0                  0                 0                0                 1                0                 0             -0.335529                -0.136338                                 -0.203281  True              True             True"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# State transfer\n",
    "state_new = df_train.state_get()\n",
    "df_test.state_set(state_new)\n",
    "\n",
    "# Preview\n",
    "df_test.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, let's check the performance of all the individual models as well as on the ensembler, on the test set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:43.490196Z",
     "start_time": "2020-05-01T17:12:43.337368Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "prediction_xgb\n",
      "Accuracy: 0.786\n",
      "f1 score: 0.728\n",
      "roc-auc: 0.773\n",
      " \n",
      "prediction_svc\n",
      "Accuracy: 0.802\n",
      "f1 score: 0.743\n",
      "roc-auc: 0.786\n",
      " \n",
      "prediction_lr\n",
      "Accuracy: 0.779\n",
      "f1 score: 0.713\n",
      "roc-auc: 0.762\n",
      " \n",
      "prediction_final\n",
      "Accuracy: 0.809\n",
      "f1 score: 0.771\n",
      "roc-auc: 0.804\n",
      " \n"
     ]
    }
   ],
   "source": [
    "pred_columns = df_train.get_column_names(regex='^prediction_')\n",
    "for i in pred_columns:\n",
    "    print(i)\n",
    "    binary_metrics(y_true=df_test.survived.values, y_pred=df_test[i].values)\n",
    "    print(' ')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We see that our ensembler is doing a better job than any idividual model, as expected.\n",
    "\n",
    "Thanks you for going over this example. Feel free to copy, modify, and in general play around with this notebook."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
